Word Count using TPIPE

#1
There was a recent post asking for a Word Count command.

Using TPIPE, I can count the number of lines in a file. Example;
Code:
echo %@execstr[tpipe /input=mytextfile.txt /grep=5,0,0,0,1,0,0,0,"[^ \t\r\n]*"] lines
Is it possible to use TPIPE to count the number of words in a file?

I was looking at the /simple=40 option, that creates a word list, but when I did;
Code:
%@execstr[tpipe /input=mytextfile.txt /simple=40]
it just gave me the first word in the text file.

I was thinking that the list could be output to a temporary file, and then use @lines[temporaryfile] to get the word count. I'm sure that TPIPE can somehow do all of this, not just sure how.

Joe
 
#3
...and here's a quick little batch file to obtain the word count of a text file, and store the word count in the _wc environment variable;
Code:
:: wc.btm
@setlocal
@echo off
iff %# eq 0 then
  echo USAGE: wc.btm yourtextfile.txt
  quit
endiff
iff exist %1 then
  set output=%@unique[]
  tpipe /input=%1 /simple=40 /output=%output
  set _wc=%@inc[%@lines[%output]]
  if exist %output del /q %output
  echo The word count of %1 is in the variable _wc
else
  echo %1 does not exist
endiff
endlocal _wc
Now, when I used the wc.exe inside my Cygwin Terminal, it gave me a word count of 914 on a text file that I used for testing.

The wc.btm file returned a word count of 886 words.

Does TPIPE count words differently than wc.exe?

Joe
 
#5
One of these days, someone has to create a .CHM file of all the plugins available....

Using words from your plugin, it said that my test file had;
Code:
  873 words total, 392 unique, 88 proper.  914 runs of non-blanks.
  11 sentences total:  10.  0!  1?  Average sentence 14.8 words.
  9 paragraphs, 112 titles.  Average paragraph 1.2 sentences.
  364 lines total, 201 not blank; the longest had 78 characters.
  9521 characters in 9521 bytes (OEM, prewrapped).
I like the _words internal variable, which, of course, returned 873.

If I use the Word/Line Count option from within VIEW, I get 903 words.

So, four different ways to count words in the same file, and four different values returned;
Code:
wc.btm - 886
wc.exe - 914
_words - 873
view - 903
Joe
 
#7
The _WC internal variable (displayed as "runs of non-blanks" in the command's output) is intended to return the same value as the Unix wc command. (You can use it in loo of that utility, yuk yuk.)
Your _wc gave me the same results as my wc.btm. Then it dawned on me...

I had to unset the _wc from my wc.btm so that I could use the _wc from your plugin.

Well, your _wc returns 914, same as the cygwin wc.exe

Winner!

Joe
 
#9
Why not TPIPE with both?

Code:
/simple=40 /grep=5,0,0,0,1,0,0,0,"[^ \t\r\n]*"
It ought to work but I couldn't get an accurate count on a file.

Code:
v:\> echo My dog has fleas.^nMy cat has fleas. | tpipe /simple=40 /grep=5,0,0,0,1,0,0,0,"[^\s\t\r\n]*"
8
 
#10
Why not TPIPE with both?

Code:
/simple=40 /grep=5,0,0,0,1,0,0,0,"[^ \t\r\n]*"
Thanks Vince. That makes my batch file cleaner;
Code:
:: wc.btm
@setlocal
@echo off
iff %# eq 0 then
  echo USAGE: wc.btm yourtextfile.txt
  quit
endiff
iff exist %1 then
  set _wc=%@execstr[tpipe /input=%1 /simple=40 /grep=5,0,0,0,1,0,0,0,"[^\s\t\r\n]*"]
  echo The word count of %1 is in the variable _wc
else
  echo %1 does not exist
endiff
endlocal _wc
It still returns a word count of 886. The plugin that Charles provided returns the same result as the Cygwin wc.exe, so I'm going to stick with that.

That is, unless, TPIPE can be made to return a word count of 914 on my test file.

Joe
 
Last edited:
#11
After more experimenting, it seems that TPIPE's "/simple=40" is totally inappropriate (IMHO, anyway). Apparently, it considers only a-z, A-Z, 0-9, and '-' as "word characters". That will rarely agree with any version of WC.EXE that I've seen.