Welcome!

By registering with us, you'll be able to discuss, share and private message with other members of our community.

SignUp Now!

FFIND wordA AND wordB

Aug
376
9
I "have to" search through a lot of textfiles to find out if they contain somewhere WordA .and. WordB (in any order).

The most straightforward way would be: List all the files that contain WordA and search through all those files to see if they contain WordB.
But as there are really a lot of textfiles (an offline version of Wikipedia) and also more words to look for ... saving on iterations is quite welcome.

I *think* I tried all (combinations of) options of FFIND, including the regular expressions, but did not succeed.
Is it possible with FFIND? Or should I better try this with programs like AWK or SED? (I used those a couple of times and I did survive ;-)
 
Might FINDSTR be of assistance, for example
Code:
The following are all equivalent ways of expressing a case insensitive regex search for any line that contains both "hello" and "goodbye" in any order

/i /r /c:"hello.*goodbye" /c:"goodbye.*hello"
-i -r -c:"hello.*goodbye" /c:"goodbye.*hello"
/irc:"hello.*goodbye" /c:"goodbye.*hello"

...or one of the other many options available for FINDSTR.

Ref: SS64 FindStr

Joe
 
I "have to" search through a lot of textfiles to find out if they contain somewhere WordA .and. WordB (in any order).
... not necessarily in the same line? ... not with a line-based text util. But if you don't mind two FFINDs ... this finds all CPPs containing "wmain" and "goto" in either order. Use regexes if you want whole words, case, et c. There are 734 CPP files in P:. This took about 4 seconds.
Code:
v:\> do f in /p ffind /b /s /l /m /t"wmain" p:\*.cpp ( ffind /b /l /m /t"goto" %f )
P:\4Console\consizeapp.cpp
P:\ifactor\ifactor.cpp
P:\mydu\mydu.cpp
P:\myes\cli.cpp
P:\n-ones\n-ones.cpp
P:\perffreq\perffreq.cpp
P:\pset\pset.cpp
P:\pset-experimental\pset.cpp
P:\remotecp\pset.cpp
P:\rndmfile\rndmtext.cpp
P:\shhotkey\shhotkey.cpp
P:\wmiquery\wmiquery.cpp
P:\wmiuptime\wmiuptime.cpp

... in the same line? Do that with TPIPE's grep and a regex like "wordA.*wordB|wordB.*wordA".
 
AWK (or PERL, my tool of choice) are considerably faster for heavy text processing.

In Perl, a basic script would look like this [untested]:
Code:
my $fn = $ARGV[0] ;
open my $fh, "<", $fn or die $!;
local $/; # enable localized slurp mode
my $content = <$fh>;
if ( $content =~ m/WordA/ && $content =~ m/WordB/ ) { print "$fn\n" ; }
close $fh;
 
@Joe Caverly : Vince is right; FINDSTR is line oriented and the text could be anywhere on the page. If you use FINDSTR more often (as I do; it is not the most sophisticated utility, but it is on every Windows system), you might like this page: [title] where someone did some extensive research on the (undocumented) options of FINDSTR.

@vefatica : Nice one-liner, but still requires a 2-pass reading of the files. Something I like to avoid if possible.
But it gave me inspiration for another possibility: Read the textfile as an array and parse that one. Will work on that.

@Christian Albaret : Not in a million years I would have thought about Perl. It just wasn't on my radar (it's one of those things that you pass 4 times a day without really noticing). It seems like a really strong option. Thanks a lot for pointing out!
It is time to (finally) learn som Perl :-) And your script will surely help with that. Thanks!
 
Back
Top
[FOX] Ultimate Translator
Translate