WAD Counting things with TPIPE /grep

vefatica · Jun 3, 2012

From the help for TPIPE /grep

Code:

CountMatches - 1 to only output a count of the number of matches

Apparently, this will count the characters (very slowly).

Code:

v:\> type 116402.txt | tpipe /grep=2,0,0,0,1,0,0,0,.
3968262

And this will count the lines:

Code:

v:\> type 116402.txt | tpipe /grep=3,0,0,0,1,0,0,0,.
116402

This counts words, but in the process, outputs 116402 blank lines!

Code:

type 116402.txt | tpipe /grep=2,0,0,0,1,0,0,0,"[^ \t\r\n]*"
(116402 blank lines here)
582010

vefatica · Jun 3, 2012

vefatica said:
From the help for TPIPE /grep

Code:

CountMatches - 1 to only output a count of the number of matches

Here's a simpler example, showing a blank line output for every line in the file.

Code:

v:\> echo My^r^ndog^r^nhas^r^nfleas. > doggie.txt v:\> tpipe /input=doggie.txt /grep=2,0,0,0,1,0,0,0,"[^ \t\r\n]*" 4 v:\>

rconn · Jun 3, 2012

vefatica said:
From the help for TPIPE /grep

Code:

CountMatches - 1 to only output a count of the number of matches

Apparently, this will count the characters (very slowly).

Code:

v:\> type 116402.txt | tpipe /grep=2,0,0,0,1,0,0,0,. 3968262

That's an *incredibly* inefficient way of calling TPIPE ...

Code:
And this will count the lines:

Code:

v:\> type 116402.txt | tpipe /grep=3,0,0,0,1,0,0,0,. 116402

See above.

Code:
This counts words, but in the process, outputs 116402 blank lines!

Code:

type 116402.txt | tpipe /grep=2,0,0,0,1,0,0,0,"[^ \t\r\n]*" (116402 blank lines here) 582010

Without your mystery "116402.txt" file, there's probably no point in my even looking at this.

What's with your insistence on TYPE'ing the file and piping the output to TPIPE?

Steve Fabian · Jun 3, 2012

rconn said:
What's with your insistence on TYPE'ing the file and piping the output to TPIPE?

Could it be the name tPIPE? It seems to IMPLY pipes as its primary use...

rconn · Jun 3, 2012

You're inferring a relationship that doesn't exist. There are lots of different kinds of "pipes" -- in this case, I called it "TPIPE" because it's based on the "TextPipe Engine".

I've said in several previous messages that piping input & output is the slowest possible usage. TYPE'ing a file and piping it to TPIPE is at least an order of magnitude slower than simply passing TPIPE the filename.

vefatica · Jun 3, 2012

rconn said:
Without your mystery "116402.txt" file, there's probably no point in my even looking at this.

There's nothing special about that file or about piping. Whenever you count words (as below), it outputs a blank line for every line in the file (whereas it's supposed to output only the count). Didn't you see my simple example?

Code:

v:\> echo My^r^ndog^r^nhas^r^nfleas. > doggie.txt
 
v:\> tpipe /input=doggie.txt /grep=2,0,0,0,1,0,0,0,"[^ \t\r\n]*"
 
 
 
 
4
 
v:\>

rconn · Jun 3, 2012

vefatica said:
There's nothing special about that file or about piping. Whenever you count words (as below), it outputs a blank line for every line in the file (whereas it's supposed to output only the count). Didn't you see my simple example?

Code:

v:\> echo My^r^ndog^r^nhas^r^nfleas. > doggie.txt v:\> tpipe /input=doggie.txt /grep=2,0,0,0,1,0,0,0,"[^ \t\r\n]*" 4 v:\>

Your line should be:

tpipe /input=doggie.txt /grep=5,0,0,0,1,0,0,0,"[^ \t\r\n]*"

The "2" is telling TPIPE to extract (and output) all the matching lines; "5" tells it to remove them.

The help for "CountMatches" has a typo; it should read:

CountMatches - 1 to output a count of the number of matches

vefatica · Jun 3, 2012

rconn said:
Your line should be:

tpipe /input=doggie.txt /grep=5,0,0,0,1,0,0,0,"[^ \t\r\n]*"

The "2" is telling TPIPE to extract (and output) all the matching lines; "5" tells it to remove them.

The help for "CountMatches" has a typo; it should read:

CountMatches - 1 to output a count of the number of matches

That only works because *all* lines match. What if only *some* lines matched?

Code:

v:\> tpipe /input=doggie.txt /grep=5,0,0,0,1,0,0,0,dog
My
has
fleas.
1

Steve Fabian · Jun 3, 2012

rconn said:
You're inferring a relationship that doesn't exist. There are lots of different kinds of "pipes" -- in this case, I called it "TPIPE" because it's based on the "TextPipe Engine".

I've said in several previous messages that piping input & output is the slowest possible usage. TYPE'ing a file and piping it to TPIPE is at least an order of magnitude slower than simply passing TPIPE the filename.

What's in a name? I am not inferring any relationships. I am inferring that to the average user the name TPIPE means a command relating to pipes - devices allowing contents to flow from one place to another - whether in a tobacco pipe, in the Alaska Pipeline, or in a software mechanism connecting software processes.

It is clear to anyone who understands the underlying mechanisms that reading data from a file by the TYPE command, and directing its standard output to the standard input of another process requires much more processing than would be required if the second process read the file directly, and is much less efficient. This has nothing to do with the intuitive interpretation of the name TPIPE - anyone who speaks English associates the word with pipes. An unfortunate name, even if it is based on the name "TextPipe Engine" - which itself is misnamed.

Possible reasons for sending data to a program using a pipe instead of permitting it to access the source file directly iinclude the need to use a process with access rights to the source file that are not granted to the processing program. Vince's example above does not have such a need...

Steve Fabian · Jun 3, 2012

Continuing previous post. Many programs coming from the Unix world do not have file access capabilities, accept data only through standard input, and dispose of generated data exlusiively through standard output. Filter utillities like CUT and TR are prime examples. When you try to determine how much of your existing procedure using Unix utilities can be replace by the TPIPE command, it is natural to pipe to TPIPE instead of the more efficient direct reading, requriring much less modification of existing programs.

vefatica · Jun 4, 2012

Steve Fabian said:
Continuing previous post. Many programs coming from the Unix world do not have file access capabilities, accept data only through standard input, and dispose of generated data exlusiively through standard output. Filter utillities like CUT and TR are prime examples. When you try to determine how much of your existing procedure using Unix utilities can be replace by the TPIPE command, it is natural to pipe to TPIPE instead of the more efficient direct reading, requriring much less modification of existing programs.

My two versions of CUT.EXE take a file. Gnu's TR.EXE doesn't; the Thompson Toolkit's TR.EXE (17 years old) does.

Speed varies greatly, with TPIPE not too fast in this comparison.

Code:

v:\> wc IpToCountry.csv
  Lines  Words  Chars
116402  116402 3015512
 
v:\> timer & g:\gnu\grep US IpToCountry.csv > NUL & timer
Timer 1 on: 00:27:03
Timer 1 off: 00:27:03  Elapsed: 0:00:00.61
 
v:\> timer & g:\ttk\grep US IpToCountry.csv > NUL & timer
Timer 1 on: 00:27:34
Timer 1 off: 00:27:34  Elapsed: 0:00:00.07
 
v:\> timer & tpipe /input=IpToCountry.csv /grep=3,0,0,0,0,0,0,0,US > NUL & timer
 
Timer 1 on: 00:27:48
Timer 1 off: 00:27:48  Elapsed: 0:00:00.30

rconn · Jun 23, 2012

vefatica said:
That only works because *all* lines match. What if only *some* lines matched?

Code:

v:\> tpipe /input=doggie.txt /grep=5,0,0,0,1,0,0,0,dog My has fleas. 1

tpipe /input=doggie.txt /grep=3,0,0,0,1,0,0,0,"dog"

returns a single line "1".

Search

Welcome!

WAD Counting things with TPIPE /grep

vefatica

vefatica

rconn

Administrator

Steve Fabian

rconn

Administrator

vefatica

rconn

Administrator

vefatica

Steve Fabian

Steve Fabian

vefatica

rconn

Administrator

Similar threads