WAD Counting things with TPIPE /grep

  • This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn more.
#1
From the help for TPIPE /grep​
Code:
CountMatches - 1 to only output a count of the number of matches
Apparently, this will count the characters (very slowly).
Code:
v:\> type 116402.txt | tpipe /grep=2,0,0,0,1,0,0,0,.
3968262
And this will count the lines:
Code:
v:\> type 116402.txt | tpipe /grep=3,0,0,0,1,0,0,0,.
116402
This counts words, but in the process, outputs 116402 blank lines!
Code:
type 116402.txt | tpipe /grep=2,0,0,0,1,0,0,0,"[^ \t\r\n]*"
(116402 blank lines here)
582010
 
#2
From the help for TPIPE /grep​
Code:
CountMatches - 1 to only output a count of the number of matches
Here's a simpler example, showing a blank line output for every line in the file.
Code:
v:\> echo My^r^ndog^r^nhas^r^nfleas. > doggie.txt
 
v:\> tpipe /input=doggie.txt /grep=2,0,0,0,1,0,0,0,"[^ \t\r\n]*"
 
 
 
 
4
 
v:\>
 

rconn

Administrator
Staff member
May 14, 2008
10,100
85
#3
From the help for TPIPE /grep​
Code:
CountMatches - 1 to only output a count of the number of matches
Apparently, this will count the characters (very slowly).
Code:
v:\> type 116402.txt | tpipe /grep=2,0,0,0,1,0,0,0,.
3968262
That's an *incredibly* inefficient way of calling TPIPE ...

And this will count the lines:
Code:
v:\> type 116402.txt | tpipe /grep=3,0,0,0,1,0,0,0,.
116402
See above.

This counts words, but in the process, outputs 116402 blank lines!
Code:
type 116402.txt | tpipe /grep=2,0,0,0,1,0,0,0,"[^ \t\r\n]*"
(116402 blank lines here)
582010
Without your mystery "116402.txt" file, there's probably no point in my even looking at this.

What's with your insistence on TYPE'ing the file and piping the output to TPIPE?
 

rconn

Administrator
Staff member
May 14, 2008
10,100
85
#5
You're inferring a relationship that doesn't exist. There are lots of different kinds of "pipes" -- in this case, I called it "TPIPE" because it's based on the "TextPipe Engine".

I've said in several previous messages that piping input & output is the slowest possible usage. TYPE'ing a file and piping it to TPIPE is at least an order of magnitude slower than simply passing TPIPE the filename.
 
#6
Without your mystery "116402.txt" file, there's probably no point in my even looking at this.
There's nothing special about that file or about piping. Whenever you count words (as below), it outputs a blank line for every line in the file (whereas it's supposed to output only the count). Didn't you see my simple example?


Code:
v:\> echo My^r^ndog^r^nhas^r^nfleas. > doggie.txt
 
v:\> tpipe /input=doggie.txt /grep=2,0,0,0,1,0,0,0,"[^ \t\r\n]*"
 
 
 
 
4
 
v:\>
 

rconn

Administrator
Staff member
May 14, 2008
10,100
85
#7
There's nothing special about that file or about piping. Whenever you count words (as below), it outputs a blank line for every line in the file (whereas it's supposed to output only the count). Didn't you see my simple example?


Code:
v:\> echo My^r^ndog^r^nhas^r^nfleas. > doggie.txt
 
v:\> tpipe /input=doggie.txt /grep=2,0,0,0,1,0,0,0,"[^ \t\r\n]*"
 
 
 
 
4
 
v:\>
Your line should be:

tpipe /input=doggie.txt /grep=5,0,0,0,1,0,0,0,"[^ \t\r\n]*"

The "2" is telling TPIPE to extract (and output) all the matching lines; "5" tells it to remove them.

The help for "CountMatches" has a typo; it should read:

CountMatches - 1 to output a count of the number of matches
 
#8
Your line should be:

tpipe /input=doggie.txt /grep=5,0,0,0,1,0,0,0,"[^ \t\r\n]*"

The "2" is telling TPIPE to extract (and output) all the matching lines; "5" tells it to remove them.

The help for "CountMatches" has a typo; it should read:

CountMatches - 1 to output a count of the number of matches
That only works because *all* lines match. What if only *some* lines matched?
Code:
v:\> tpipe /input=doggie.txt /grep=5,0,0,0,1,0,0,0,dog
My
has
fleas.
1
 
#9
You're inferring a relationship that doesn't exist. There are lots of different kinds of "pipes" -- in this case, I called it "TPIPE" because it's based on the "TextPipe Engine".

I've said in several previous messages that piping input & output is the slowest possible usage. TYPE'ing a file and piping it to TPIPE is at least an order of magnitude slower than simply passing TPIPE the filename.
What's in a name? I am not inferring any relationships. I am inferring that to the average user the name TPIPE means a command relating to pipes - devices allowing contents to flow from one place to another - whether in a tobacco pipe, in the Alaska Pipeline, or in a software mechanism connecting software processes.

It is clear to anyone who understands the underlying mechanisms that reading data from a file by the TYPE command, and directing its standard output to the standard input of another process requires much more processing than would be required if the second process read the file directly, and is much less efficient. This has nothing to do with the intuitive interpretation of the name TPIPE - anyone who speaks English associates the word with pipes. An unfortunate name, even if it is based on the name "TextPipe Engine" - which itself is misnamed.

Possible reasons for sending data to a program using a pipe instead of permitting it to access the source file directly iinclude the need to use a process with access rights to the source file that are not granted to the processing program. Vince's example above does not have such a need...
 
#10
Continuing previous post. Many programs coming from the Unix world do not have file access capabilities, accept data only through standard input, and dispose of generated data exlusiively through standard output. Filter utillities like CUT and TR are prime examples. When you try to determine how much of your existing procedure using Unix utilities can be replace by the TPIPE command, it is natural to pipe to TPIPE instead of the more efficient direct reading, requriring much less modification of existing programs.
 
#11
Continuing previous post. Many programs coming from the Unix world do not have file access capabilities, accept data only through standard input, and dispose of generated data exlusiively through standard output. Filter utillities like CUT and TR are prime examples. When you try to determine how much of your existing procedure using Unix utilities can be replace by the TPIPE command, it is natural to pipe to TPIPE instead of the more efficient direct reading, requriring much less modification of existing programs.
My two versions of CUT.EXE take a file. Gnu's TR.EXE doesn't; the Thompson Toolkit's TR.EXE (17 years old) does.

Speed varies greatly, with TPIPE not too fast in this comparison.
Code:
v:\> wc IpToCountry.csv
  Lines  Words  Chars
116402  116402 3015512
 
v:\> timer & g:\gnu\grep US IpToCountry.csv > NUL & timer
Timer 1 on: 00:27:03
Timer 1 off: 00:27:03  Elapsed: 0:00:00.61
 
v:\> timer & g:\ttk\grep US IpToCountry.csv > NUL & timer
Timer 1 on: 00:27:34
Timer 1 off: 00:27:34  Elapsed: 0:00:00.07
 
v:\> timer & tpipe /input=IpToCountry.csv /grep=3,0,0,0,0,0,0,0,US > NUL & timer
 
Timer 1 on: 00:27:48
Timer 1 off: 00:27:48  Elapsed: 0:00:00.30