FFIND needs work

May 20, 2008
9,969
72
Syracuse, NY, USA
(continuing what started in the thread on REGDIR). Here's another, seemingly different, failure. See below, in a pipe, FFIND /kmve"1{5}" finds the first several occurrences of "11111" ... then a bogus line is output ... then the same number over and over. I don't know how long that would have continued; I stopped it after about 10 minutes.

Code:
v:\> head /n3 1to1300000.txt & echo ... & tail /n3 1to1300000.txt
1
2
3
...
1299998
1299999
1300000

v:\> type 1to1300000.txt | ffind /mkve"1{5}"
11111
111110
111111
111112
111113
111114
111115
111116
111117
111118
111119
211111
311111
411111
511111
611111
711111
811111
911111
1011111
03
1011111
1011111
1011111
1011111
[snip]
 

rconn

Administrator
Staff member
May 14, 2008
11,502
115
Any progress on this?
The problem is when you have a lot of content in the pipe (> 8Mb minimum), and the distance between matches is > 8Mb. FFIND is trying to rewind to find the line numbers, and cannot do it in a pipe beyond the beginning of the current buffer. (Nor can it handle reverse searching, again because it can't rewind.)

There's two solutions - the first is to read the entire contents of the pipe into a temp file and then search that (that's what most of the grep-type tools do). FFIND used to do that (long ago), and users complained that they wanted to see the intermediate output when they had a long-running pipe. The second is to continuously cache the pipe input into a temp file and read that temp file. (The second solution still won't work for reverse searching.) I'm trying the second approach, but it's vastly more complicated than either the first solution or the current (23-year-old) behavior.

There's also an existing workaround - use |!.
 
May 20, 2008
9,969
72
Syracuse, NY, USA
GREP and FINDSTR don't use temp files. In a pipe they're 10 times the speed of FFIND. They can provide line numbers if desired. And they don't produce erroneous results.
 

rconn

Administrator
Staff member
May 14, 2008
11,502
115
Try passing a multi-Gb pipe to GREP or FINDSTR.

FINDSTR is *very* problematic with large files. GREP - depends on the implementation. The source I've looked at reads the entire contents of the pipe before attempting to do anything.

If you have alternatives to FFIND you like better for the extremely large pipes (that you apparently have never used?) then by all means use them instead.
 
May 20, 2008
9,969
72
Syracuse, NY, USA
There's also an existing workaround - use |!.
The in-process pipe is no better.

Code:
v:\> dir /s /b c:\ | grep "11fb9539.manifest"
C:\Windows\WinSxS\Manifests\amd64_microsoft-windows-s..cingstack.resources_31bf3856ad364e35_10.0.18362.1_en-us_5f172abd11fb9539.manifest

v:\> dir /s /b c:\ |! ffind /kmvt"11fb9539.manifest"
85\api-ms-win-core-stringloader-l1-1-1.dll
 
May 20, 2008
9,969
72
Syracuse, NY, USA
An in-process pipe is a file. You said previously that files worked - are you now saying that files don't work?
You suggested the in-process pipe. Files work; input redirection doesn't.

Code:
v:\> ffind /kmvt"11fb9539.manifest" bigc.txt
C:\Windows\WinSxS\Manifests\amd64_microsoft-windows-s..cingstack.resources_31bf3856ad364e35_10.0.18362.1_en-us_5f172abd11fb9539.manifest

v:\> ffind /kmvt"11fb9539.manifest" < bigc.txt
85\api-ms-win-core-stringloader-l1-1-1.dll
 
May 20, 2008
9,969
72
Syracuse, NY, USA
A well-working and reasonable solution is TPIPE. It's just about as fast as GREP.

Code:
v:\> timer & dir /s /b c:\ | tpipe /grep=3,0,0,0,0,0,0,0,"11fb9539.manifest" & timer
Timer 1 on: 15:33:01
C:\Windows\WinSxS\Manifests\amd64_microsoft-windows-s..cingstack.resources_31bf3856ad364e35_10.0.18362.1_en-us_5f172abd11fb9539.manifest
Timer 1 off: 15:33:32  Elapsed: 0:00:30.989

v:\> timer & dir /s /b c:\ | grep "11fb9539.manifest" & timer
Timer 1 on: 15:33:43
C:\Windows\WinSxS\Manifests\amd64_microsoft-windows-s..cingstack.resources_31bf3856ad364e35_10.0.18362.1_en-us_5f172abd11fb9539.manifest
Timer 1 off: 15:34:13  Elapsed: 0:00:30.572
 
May 20, 2008
9,969
72
Syracuse, NY, USA
FFIND uses the same code as LIST; both have been deprecated and replaced by VIEW & TPIPE.
Speaking of TPIPE ... I have often wondered about making TPIPE 100% internal. As it is, a single instance of TPIPE is fast and powerful, but calling TPIPE many times, say in a loop, is inefficient. I've always figured that was because of the overhead involved in starting TPIPE.EXE and it's loading a 9MB DLL.

It it were 100% internal I imagine it would be faster, but ...

1. Is it technically possible/feasible?
2. Does your licence allow it?
3. What would be the downside?

Regarding 3, TAKECMD.DLL is already huge, so I don't suppose including the code from TPIPE.EXE would cost much (considerably less that TPIPE.EXE's 158 KB). And TCC already loads 87+ MB of DLLs so what would be the cost of another 9 MB from TEXTPIPEENGINE.DLL?

I suppose at least a few TCC internals could benefit from having TPIPE available (FFIND /T and /E for example).

Comments appreciated.
 

rconn

Administrator
Staff member
May 14, 2008
11,502
115
1. Is it technically possible/feasible?
2. Does your licence allow it?
3. What would be the downside?
1. Only for 32-bit, as the 64-bit textpipeengine.dll isn't working reliably
2. Yes
3. You really, really wouldn't want me to do it - it would more than double the size of TakeCmd.dll, and add at least a second or two to the startup time for every TCC shell.
 
May 20, 2008
9,969
72
Syracuse, NY, USA
1. Only for 32-bit, as the 64-bit textpipeengine.dll isn't working reliably
2. Yes
3. You really, really wouldn't want me to do it - it would more than double the size of TakeCmd.dll, and add at least a second or two to the startup time for every TCC shell.
In that case, I really, really wouldn't you to do it. :-)