How to? FFIND, regular expressions and double quotes

Folks,

I've been driving myself quietly doolally this morning trying to get a regular expression to work with FFIND. Fundamentally what I'm trying to do is find hyperlinks containing ampersands that have not been encoded, across just shy of 400 HTML pages (held locally). The regular expression I came up with (which simply finds links with ampersands, without attempting to ignore those that are already &, a secondary issue) is:

(.*) href=\"(.*)&(.*)\"

and running this through an online REGEX tester that I've used before (albeit it is skewed to use from PHP, and I had to add wrapping forward slashes) gives the result I'm expecting with the contents of the index.html file supplied as the data. Issuing an equivalent FFIND fails, however:
Code:
I:\websites\Badgers\new>ffind /l /p /v /e"(.*) href=\"(.*)&(.*)\"" index.html
Usage : FFIND [/+n /-n /8 /A[[:][-][+]rhsdaecjot] /BC /D[list] /E["xx"] /I"text" /FGIK /L[n] /M /N[dehjs] /O[[:][-]acdeginrsu] /PR /Sn /T"xx" /U /V /W
/X["xx..."] /Y] file...
TCC: (Sys) The system cannot find the path specified.
".*\"" index.html"

I:\websites\Badgers\new>
Perhaps unsurprisingly, since I seem to remember discussions about how to search for strings containing double quotes back in the days when JP Software's support was handled on Compuserve, it looks like the double quotes are problematic. I tried using the TCC escape character before each of the imbedded double quotes, but that didn't give the expected result:
Code:
I:\websites\Badgers\new>ffind /l /p /v /e"(.*) href=\^"(.*)&(.*)\^"" index.html

0 lines in 0 files

I:\websites\Badgers\new>
There is one line in that file that matches my desired expression, as evidenced by this output:
Code:
I:\websites\Badgers\new>ffind /l /p /v /e"(.*) href=(.*)&(.*)" index.html

---- I:\websites\Badgers\new\index.html
[115] shake a stick at, the Badgers &lsquo;<a href="hall_of_fame.html">hall of fame</a>&rsquo;, a list of
[200] <a href="http://maps.google.co.uk/maps/ms?msa=0&amp;msid=202456031702105905110.0004bf1c657589aaecc27&amp;z=10" target="_blank">a 2012 version of
the Google Map showing opposition ground locations</a>
[210] <br><p>The <a href="whatsold_2011.html">older What&rsquo;s New entries</a> are
[229] <a class="mTlink" href="club_officers.html">[Club&nbsp;Officers]</a>
[230] <a class="mTlink" href="hall_of_fame.html">[Hall&nbsp;Of&nbsp;Fame]</a>
[233] Copyright ® 2000-2012 <a href="http://homepage.ntlworld.com/steve.pitts/" target="_blank">Steve Pitts</a>/Badgers Cricket Club &ndash; All right
s reserved

6 lines in 1 file

I:\websites\Badgers\new>
where five of the lines are spurious (because the ampersand is outside of the href attribute), but the link to Google on line 200 matches the original regex (albeit that the ampersands are encoded).

So I guess the simple question is, how do I escape double quote characters in the regular expression when using the /E option to FFIND??
 
Jan 19, 2011
605
15
Norman, OK
Here ya go... looks like you gotta do the weird escape thing \^" and also put it in a character class [ ] .
Code:
ffind /l /p /v /e"href=[\^"][^\^"]*&.*?[\^"]" index.html

Edit: I figured out what the "weird escape thing" is doing. The \ is the regex escape that escapes the ^ which is the TCC escape that escapes the " and putting in the [] forces the regex to accept it as a character. Does that make any sense?
By the way, apparently epement and I submitted our responses at almost exactly the same time. I clicked the Post Reply button and when the screen refreshed both of our posts were there. Nice timing.
 
Jun 28, 2008
69
2
Chicago
www.pement.org
So I guess the simple question is, how do I escape double quote characters in the regular expression when using the /E option to FFIND??

Use the expression \x22. E.g.,
Code:
ffind /l /p /v /e"(.*) href=\x22(.*)&(.*)\x22" index.html

I don't know why you're using capture groups, though. I think you want something like this instead:
Code:
ffind /l /p /v /e"href=\x22[^^\x22]*&(?!amp;)[^^\x22]*\x22" index.html

I've tested it and it works for me.
Eric
 
Thanks folks, both options fulfill the stated question perfectly and Eric has even been kind enough to solve my original requirement. Much appreciated.

I don't know why you're using capture groups, though
Habit, since you ask :) Most of my use of regular expressions has been when programming page scrapers in PHP, when in general I want to capture the runs of 'other' characters. I also find it easier to read things with the brackets separating distinct elements of the syntax. Like most powerful tools, I find that regexes are a pain to remember the syntax for if you don't use them regularly - something that I suspect is also going to apply to the new TPIPE command!?
 
May 24, 2010
855
0
Northlake, Il
Steve, I can add because I have a sense of humor about it at the moment that I try to avoid using "full" regular expressions as much as possible, also. In my case, it's due the combination of bad memory and probably a half-dozen regular expression "syntaxes" I've had to use in my more than 40 years (I wrote my first program for pay when I was 17) of doing this kind of stuff.

- Dan


- Dan
 
Similar threads
Thread starter Title Forum Replies Date
vefatica FFIND and multi-line regular expressions Support 4
Phileosophos Can FFIND match file names with a regular expression? Support 19
J ffind does not find files Support 4
D ffind hangs on large file Support 18
Alpengreis ffind dialog (/W) problem Support 4
vefatica FFIND, temp files? ... not deleted? Support 0
S FFIND text that includes " Support 7
D ffind /e hangs Support 7
vefatica FFIND needs work Support 12
Joe Caverly Multiple Text Searches at once using FFIND or TPIPE Support 4
Kachupp FFIND TPIPE Support 9
Gamegod ffind bug with chinese Support 2
vefatica Make FFIND a bit more friendly? Support 14
M Trying to use ffind with a @file.lst doesn't work Support 4
vefatica Corruption from FFIND? Support 5
vefatica FFIND /S and System32? (and an OT mystery) Support 14
T FFind - can we display n number of lines after the find? Support 2
vefatica Help nit (FFIND and DIR with /S) Support 0
R ffind not setting %_ffind_ vars if /f is used Support 7
M FFIND wordA AND wordB Support 4
vefatica FFIND and _? Support 1
B Fixed FFIND doesn't find last character in file without CR/LF at EOF Support 1
dcantor How to make line numbers fixed width in FFIND Support 11
vefatica FFIND /S, find directory with specified name? Support 7
D Why doesn't ffind find directory? Support 3
C FFind /w works but FFind /= doesn't? Support 2
vefatica FFIND goes crazy Support 8
Stefano Piccardi TCC 13 vs. TCC 14 different FFIND /E"d$" output Support 1
vefatica FFIND /S in my profile directory Support 2
A How to? Pass output from ffind to another command in a BTM file Support 5
vefatica FFIND and size ranges Support 1
R How to use ffind to find older files? Support 9
J How do I - ffind - with multiple items Support 10
M FFIND and LIST does not work for UTF-8 fles Support 10
C FFIND ERRORLEVEL Support 2
L FFIND no result display Support 10
H Directory wildcards not working with ffind Support 3
D ffind and directories Support 2
Stefano Piccardi detecting BOM, FFIND multibyte regex Support 18
dcantor FFIND syntax -- is /E"regex" /X supported? Support 2
S FFIND and quoted strings Support 4
S FFIND /E"reg exp" : not case insensitive Support 4
J Problem with FFIND Support 1
Jesse Heines Using Regular Expressions with the REN commanc Support 8
D How to? Use regular expression with REN? Support 2
R v25 Regular Expression Analyser Support 2
mikea How to? Regular expressions in TCC Support 6
rps Documentation Regular expression syntax link broken Support 1
vefatica Regular expressions? Support 12
T FOR command and regular expressions Support 19

Similar threads