Comments welcome

#1
I'm playing with a PARSE command for 4UTILS. Examples are below. Comments and suggestions are welcome.

Code:
v:\> parse /?
PARSE "string" "delims" array   (? = number of assignments made)

v:\> parse "My dog has fleas." " ." word

v:\> echo %_?
4

v:\> for /l %i in (0,1,3) echo %word[%i]
My
dog
has
fleas

v:\> echo %_IP
72.230.122.241 192.168.1.3 169.254.1.1

v:\> parse "%@word[0,%ip]" "." octet

v:\> echo %_?
4

v:\> for /l %i in (0,1,3) echo %octet[%i]
72
230
122
241
 
May 29, 2008
533
3
Groton, CT
#2
I'm playing with a PARSE command for 4UTILS. Examples are below. Comments and suggestions are welcome.

Code:
v:\> parse /?
PARSE "string" "delims" array   (? = number of assignments made)

v:\> parse "My dog has fleas." " ." word

v:\> echo %_?
4

v:\> for /l %i in (0,1,3) echo %word[%i]
My
dog
has
fleas

v:\> echo %_IP
72.230.122.241 192.168.1.3 169.254.1.1

v:\> parse "%@word[0,%ip]" "." octet

v:\> echo %_?
4

v:\> for /l %i in (0,1,3) echo %octet[%i]
72
230
122
241
Yup, I have a comment. PARSE by words or by fields.

That is, what does

Code:
SET FOO=A,,B,C
PARSE  "%foo" "," elements
give you?

I'd like to see an option for choosing whether to ignore consecutive separators or to count them separately.
 
#3
vefatica wrote:
| I'm playing with a PARSE command for 4UTILS. Examples are below.
| Comments and suggestions are welcome.
|
|
| Code:
| ---------
| v:\> parse /?
| PARSE "string" "delims" array (? = number of assignments made)
|
| v:\> parse "My dog has fleas." " ." word
|
| v:\> echo %_?
| 4
|
| v:\> for /l %i in (0,1,3) echo %word[%i]
| My
| dog
| has
| fleas
|
| v:\> echo %_IP
| 72.230.122.241 192.168.1.3 169.254.1.1
|
| v:\> parse "%@word[0,%ip]" "." octet
|
| v:\> echo %_?
| 4
|
| v:\> for /l %i in (0,1,3) echo %octet[%i]
| 72
| 230
| 122
| 241
| ---------

I agree with Dave's request to make @FIELD vs. @WORD type parsing
selectable. I'd also suggest:

1/ Make delimiter list specification optional, as in @word/@field, in the
form /D"delims"; use the same default as @word and @field

2/ make it both a command and a function, with the function value the number
of elements in the array

3/ to simplify your own parsing make the array name the first parameter, the
rest of the command line the second (parsend string) parameter.

4/ provide an option to report either the element count (as you proposed),
or the array index of the last element - this would simplify such things as
your FOR loop to report all elements.

The syntax would thus be:

PARSE /?

Create or replace an array from delimited elements of a string

PARSE [/D"delimiters"] [/F | /W] [/X] array string

/D"delimiters" list of string element delimiters
/F @field type parsing
/W @word type parsing (default)
/X report last array index instead of array size
array name of array to be created or replaced
string string to be parsed

Exit code: array size (default)

The function would have identical parameter list and options.

I know, I always want more than you intended...
--
Steve
 
#4
On Wed, 15 Apr 2009 07:35:04 -0500, dcantor <> wrote:

|---------
|SET FOO=A,,B,C
|PARSE "%foo" "," elements
|---------
|give you?
|
|I'd like to see an option for choosing whether to ignore consecutive separators or to count them separately.

v:\> SET FOO=A,,B,C

v:\> PARSE "%foo" "," elements

v:\> for /l %i in (0,1,%@dec[%_?]) echo %elements[%i]
A
B
C

As it is, PARSE is a very skimpy wrapper for wcstok_s() which treats consecutive
delimeters as one (and essentially does **all** the work). I like your
suggestion so I'll give it some thought ... might wind up a separate PARSEF.
--
- Vince
 
#5
On Wed, 15 Apr 2009 08:27:24 -0500, Steve Fábián <> wrote:

|I agree with Dave's request to make @FIELD vs. @WORD type parsing
|selectable. I'd also suggest:

Under consideration.

|1/ Make delimiter list specification optional, as in @word/@field, in the
|form /D"delims"; use the same default as @word and @field
|
|2/ make it both a command and a function, with the function value the number
|of elements in the array

Wouldn't @PARSE be sufficient (no command)? I considered returning the highest
index (count - 1). Whatever behavior I settle on, it won't be optional.

|3/ to simplify your own parsing make the array name the first parameter, the
|rest of the command line the second (parsend string) parameter.

Good idea.

|4/ provide an option to report either the element count (as you proposed),
|or the array index of the last element - this would simplify such things as
|your FOR loop to report all elements.

See above.

|I know, I always want more than you intended...

Indeed! :-)

As it stands, if the array already exists, PARSE does not try to create it. If
it's not big enough or is not 1-dimensional this will lead to errors (from SET).
I like the idea of re-using an array, but this puts the obligation on the user
to ensure the array is 1-D and big enough for all intended purposes. Any
thoughts?
--
- Vince
 
#6
vefatica wrote:
| On Wed, 15 Apr 2009 08:27:24 -0500, Steve Fábián <> wrote:
|
|| I agree with Dave's request to make @FIELD vs. @WORD type parsing
|| selectable. I'd also suggest:
|
| Under consideration.

If using two functions/commands instead of an option switch, I'd suggest
WPARSE and FPARSE. Symmetry negates need to remember which is default. First
character is more distinctive than last. (Just my personal opinion. Your
choice.)

|| 1/ Make delimiter list specification optional, as in @word/@field,
|| in the form /D"delims"; use the same default as @word and @field

Are you considering this?

|| 2/ make it both a command and a function, with the function value
|| the number of elements in the array
|
| Wouldn't @PARSE be sufficient (no command)? I considered returning
| the highest index (count - 1). Whatever behavior I settle on, it
| won't be optional.

Yes, @parse would be sufficient (just as we have @fileclose[]).

Running from low to high end of array is much more common than the frequency
of using the array size, and for this reason I prefer the uppermost index.
(Of course, in many HLLs it's the same... )

| As it stands, if the array already exists, PARSE does not try to
| create it. If it's not big enough or is not 1-dimensional this will
| lead to errors (from SET). I like the idea of re-using an array, but
| this puts the obligation on the user to ensure the array is 1-D and
| big enough for all intended purposes. Any thoughts?

Cumbersome and not likely to be very useful would be to control it using the
NoClobber option - arrays are like internal files. I see no conceptual
difference between reusing an ordinary environment variable and an array
variable when writing a program. The practicality is slightly different. If
the array variable already exists but has more than one dimension it should
be considered as one that needs to be unset before a new one is created. If
it has a single dimension, and it is reused rather than destroyed and
recreated, the first issue is undersized or oversized; the second issue is
relevant only to @FPARSE - matching empty resulting elements to the array.
All in all, it seems cheaper to just unconditionally destroy the existing
array, and define a new one.

Additional functions in the same genre are also possible. For example
@FADD/@WADD - these would perform arithmetic addition of matching elements
if both old and new are numeric (or one of them is empty), else string
concatenation. Other "vector" functions likewise. Am I making TCC into a
spreadsheet processor? Or just dreaming?
--
Steve
 
#7
On Wed, 15 Apr 2009 14:45:51 -0500, Steve Fábián <> wrote:

|vefatica wrote:
|| On Wed, 15 Apr 2009 08:27:24 -0500, Steve Fábián <> wrote:
||
||| I agree with Dave's request to make @FIELD vs. @WORD type parsing
||| selectable. I'd also suggest:
||
|| Under consideration.

Still need to figure out how I'd do it (@PARSEF).

|If using two functions/commands instead of an option switch, I'd suggest
|WPARSE and FPARSE. Symmetry negates need to remember which is default. First
|character is more distinctive than last. (Just my personal opinion. Your
|choice.)

See below.

||| 1/ Make delimiter list specification optional, as in @word/@field,
||| in the form /D"delims"; use the same default as @word and @field
|
|Are you considering this?

Yes, I already did it.

||| 2/ make it both a command and a function, with the function value
||| the number of elements in the array
||
|| Wouldn't @PARSE be sufficient (no command)? I considered returning
|| the highest index (count - 1). Whatever behavior I settle on, it
|| won't be optional.

I now have: @PARSEW[[/D"delims",]array,string] = number of assignments

with the string in the tail so it doesn't need quotes (as you suggested).

It seems natural (from over-exposure to "C") to have the number, N, of things,
and then iterate on <N. So I left it that way. I sympathize with you however
on the the need to use @DEC to do that with FOR /L.

I uploaded new VC9 plugins (VC9 subdir on lucky.syr.edu (ftp). I think only
4UTILS has changed significantly. In it is @PARSEW and an experimental PACE
command:

PACE [/N n] [file], n = approx lines per second (8)

(in a pipe too). In a real console (don't know about TCMD) you can stop/start
output with ^S (not my doing) so PACE might be reasonable for scanning a file.
--
- Vince
 
#8
vefatica wrote:
| I now have: @PARSEW[[/D"delims",]array,string] = number of assignments
|
| with the string in the tail so it doesn't need quotes.

Great. I'll do a little testing soon.
|
| It seems natural (from over-exposure to "C") to have the number, N,
| of things, and then iterate on <N. So I left it that way. I
| sympathize with you however on the the need to use @DEC to do that
| with FOR /L.

Also with DO n = ...

There are several TCC functions, e.g. @LINES, which return the index of the
last entry instead of the entry count. I suggested that method, because the
<N syntax of "C" is not available in TCC - neither DO nor FOR supports it.
|
| I uploaded new VC9 plugins (VC9 subdir on lucky.syr.edu (ftp). I
| think only 4UTILS has changed significantly. In it is @PARSEW and an
| experimental PACE command:
|
| PACE [/N n] [file], n = approx lines per second (8)
|
| (in a pipe too).

Sounds useful, though the /P(age) option of most commands is a reasonable
alternative.

} In a real console (don't know about TCMD) you can
| stop/start output with ^S (not my doing)

It goes back over 40 years to teletypes - "X-off" (in ASCII: DC3, code 19)
and "X-on" (in ASCII: DC1, code 17) were replied to the sender to suspend
and resume transmission, resp. PC/MS-DOS implemented DC3 (you can use the
Pause function of the Break key) to suspend output, and accepts any
keystroke (not including shift, alt, etc. keys) to resume. WinXP console
emulation (ntvdm) inherited the same operation, except some key combinations
are intercepted by the keyboard driver and may not be delivered to the
"console".
--
Steve
 
#9
On Wed, 15 Apr 2009 19:46:51 -0500, Steve Fábián <> wrote:

|Great. I'll do a little testing soon.
||

I uploaded another. It has @PARSEW and @PARSEF. Neither considers a quoted
group of tokens as a single token. That'll be a chore and won't happen soon.
--
- Vince
 
#10
vefatica wrote:
| On Wed, 15 Apr 2009 19:46:51 -0500, Steve Fábián <> wrote:
|
|| Great. I'll do a little testing soon.
|||
|
| I uploaded another. It has @PARSEW and @PARSEF. Neither considers a
| quoted group of tokens as a single token. That'll be a chore and
| won't happen soon.

Ouch! I have many files (mostly .csv) with quoted strings containing
embedded commas that would have been candidates for @parsef ...
--
Steve
 
#11
On Wed, 15 Apr 2009 23:48:04 -0500, Steve Fábián <> wrote:

|vefatica wrote:
|| On Wed, 15 Apr 2009 19:46:51 -0500, Steve Fábián <> wrote:
||
||| Great. I'll do a little testing soon.
||||
||
|| I uploaded another. It has @PARSEW and @PARSEF. Neither considers a
|| quoted group of tokens as a single token. That'll be a chore and
|| won't happen soon.
|
|Ouch! I have many files (mostly .csv) with quoted strings containing
|embedded commas that would have been candidates for @parsef ...

OK. I uploaded another (which probably needs tweaking). Internally it uses
@WORD(S) and @FIELD(s) so it respects grouping.
--
- Vince