FFIND wordA AND wordB

MaartenG · Mar 13, 2017

I "have to" search through a lot of textfiles to find out if they contain somewhere WordA .and. WordB (in any order).

The most straightforward way would be: List all the files that contain WordA and search through all those files to see if they contain WordB.
But as there are really a lot of textfiles (an offline version of Wikipedia) and also more words to look for ... saving on iterations is quite welcome.

I *think* I tried all (combinations of) options of FFIND, including the regular expressions, but did not succeed.
Is it possible with FFIND? Or should I better try this with programs like AWK or SED? (I used those a couple of times and I did survive ;-)

Joe Caverly · Mar 13, 2017

Might FINDSTR be of assistance, for example

Code:

The following are all equivalent ways of expressing a case insensitive regex search for any line that contains both "hello" and "goodbye" in any order

/i /r /c:"hello.*goodbye" /c:"goodbye.*hello"
-i -r -c:"hello.*goodbye" /c:"goodbye.*hello"
/irc:"hello.*goodbye" /c:"goodbye.*hello"

...or one of the other many options available for FINDSTR.

Ref: SS64 FindStr

Joe

vefatica · Mar 13, 2017

MaartenG said:
I "have to" search through a lot of textfiles to find out if they contain somewhere WordA .and. WordB (in any order).

... not necessarily in the same line? ... not with a line-based text util. But if you don't mind two FFINDs ... this finds all CPPs containing "wmain" and "goto" in either order. Use regexes if you want whole words, case, et c. There are 734 CPP files in P:. This took about 4 seconds.

Code:

v:\> do f in /p ffind /b /s /l /m /t"wmain" p:\*.cpp ( ffind /b /l /m /t"goto" %f )
P:\4Console\consizeapp.cpp
P:\ifactor\ifactor.cpp
P:\mydu\mydu.cpp
P:\myes\cli.cpp
P:\n-ones\n-ones.cpp
P:\perffreq\perffreq.cpp
P:\pset\pset.cpp
P:\pset-experimental\pset.cpp
P:\remotecp\pset.cpp
P:\rndmfile\rndmtext.cpp
P:\shhotkey\shhotkey.cpp
P:\wmiquery\wmiquery.cpp
P:\wmiuptime\wmiuptime.cpp

... in the same line? Do that with TPIPE's grep and a regex like "wordA.*wordB|wordB.*wordA".

Christian Albaret · Mar 14, 2017

AWK (or PERL, my tool of choice) are considerably faster for heavy text processing.

In Perl, a basic script would look like this [untested]:

Code:

my $fn = $ARGV[0] ;
open my $fh, "<", $fn or die $!;
local $/; # enable localized slurp mode
my $content = <$fh>;
if ( $content =~ m/WordA/ && $content =~ m/WordB/ ) { print "$fn\n" ; }
close $fh;

MaartenG · Mar 14, 2017

@Joe Caverly : Vince is right; FINDSTR is line oriented and the text could be anywhere on the page. If you use FINDSTR more often (as I do; it is not the most sophisticated utility, but it is on every Windows system), you might like this page: [title] where someone did some extensive research on the (undocumented) options of FINDSTR.

@vefatica : Nice one-liner, but still requires a 2-pass reading of the files. Something I like to avoid if possible.
But it gave me inspiration for another possibility: Read the textfile as an array and parse that one. Will work on that.

@Christian Albaret : Not in a million years I would have thought about Perl. It just wasn't on my radar (it's one of those things that you pass 4 times a day without really noticing). It seems like a really strong option. Thanks a lot for pointing out!
It is time to (finally) learn som Perl :-) And your script will surely help with that. Thanks!

Search

Welcome!

FFIND wordA AND wordB

MaartenG

Joe Caverly

vefatica

Christian Albaret

MaartenG

Similar threads