- Jul
- 178
- 1
Folks,
I've been driving myself quietly doolally this morning trying to get a regular expression to work with FFIND. Fundamentally what I'm trying to do is find hyperlinks containing ampersands that have not been encoded, across just shy of 400 HTML pages (held locally). The regular expression I came up with (which simply finds links with ampersands, without attempting to ignore those that are already &, a secondary issue) is:
(.*) href=\"(.*)&(.*)\"
and running this through an online REGEX tester that I've used before (albeit it is skewed to use from PHP, and I had to add wrapping forward slashes) gives the result I'm expecting with the contents of the index.html file supplied as the data. Issuing an equivalent FFIND fails, however:
Perhaps unsurprisingly, since I seem to remember discussions about how to search for strings containing double quotes back in the days when JP Software's support was handled on Compuserve, it looks like the double quotes are problematic. I tried using the TCC escape character before each of the imbedded double quotes, but that didn't give the expected result:
There is one line in that file that matches my desired expression, as evidenced by this output:
where five of the lines are spurious (because the ampersand is outside of the href attribute), but the link to Google on line 200 matches the original regex (albeit that the ampersands are encoded).
So I guess the simple question is, how do I escape double quote characters in the regular expression when using the /E option to FFIND??
I've been driving myself quietly doolally this morning trying to get a regular expression to work with FFIND. Fundamentally what I'm trying to do is find hyperlinks containing ampersands that have not been encoded, across just shy of 400 HTML pages (held locally). The regular expression I came up with (which simply finds links with ampersands, without attempting to ignore those that are already &, a secondary issue) is:
(.*) href=\"(.*)&(.*)\"
and running this through an online REGEX tester that I've used before (albeit it is skewed to use from PHP, and I had to add wrapping forward slashes) gives the result I'm expecting with the contents of the index.html file supplied as the data. Issuing an equivalent FFIND fails, however:
Code:
I:\websites\Badgers\new>ffind /l /p /v /e"(.*) href=\"(.*)&(.*)\"" index.html
Usage : FFIND [/+n /-n /8 /A[[:][-][+]rhsdaecjot] /BC /D[list] /E["xx"] /I"text" /FGIK /L[n] /M /N[dehjs] /O[[:][-]acdeginrsu] /PR /Sn /T"xx" /U /V /W
/X["xx..."] /Y] file...
TCC: (Sys) The system cannot find the path specified.
".*\"" index.html"
I:\websites\Badgers\new>
Code:
I:\websites\Badgers\new>ffind /l /p /v /e"(.*) href=\^"(.*)&(.*)\^"" index.html
0 lines in 0 files
I:\websites\Badgers\new>
Code:
I:\websites\Badgers\new>ffind /l /p /v /e"(.*) href=(.*)&(.*)" index.html
---- I:\websites\Badgers\new\index.html
[115] shake a stick at, the Badgers ‘<a href="hall_of_fame.html">hall of fame</a>’, a list of
[200] <a href="http://maps.google.co.uk/maps/ms?msa=0&msid=202456031702105905110.0004bf1c657589aaecc27&z=10" target="_blank">a 2012 version of
the Google Map showing opposition ground locations</a>
[210] <br><p>The <a href="whatsold_2011.html">older What’s New entries</a> are
[229] <a class="mTlink" href="club_officers.html">[Club Officers]</a>
[230] <a class="mTlink" href="hall_of_fame.html">[Hall Of Fame]</a>
[233] Copyright ® 2000-2012 <a href="http://homepage.ntlworld.com/steve.pitts/" target="_blank">Steve Pitts</a>/Badgers Cricket Club – All right
s reserved
6 lines in 1 file
I:\websites\Badgers\new>
So I guess the simple question is, how do I escape double quote characters in the regular expression when using the /E option to FFIND??