Welcome!

By registering with us, you'll be able to discuss, share and private message with other members of our community.

SignUp Now!

FFIND /E with regex EOL anchor?

May
13,828
211
I expected the same output from both of these. What's up?

Code:
v:\> echo foo | ffind /k /m /e"o"
foo

v:\> echo foo | ffind /k /m /e"o$"
 
v:\>
 
Something has changed since v32.

Code:
v:\> d:\tc32\tcc /c "echo foo | ffind /k /m /e^qo$^q"
foo

v:\> d:\tc33\tcc /c "echo foo | ffind /k /m /e^qo$^q"
 
v:\>

... maybe an Oniguruma option?
 
Onigmo (used in v32 and earlier, abandoned by the developer) had a non-standard option to recognize CR/LF line endings. Oniguruma only recognizes LF (there is no option for CR/LF).
If my test is valid, it also recognizes \0 as EOL.

Code:
v:\> echos foo | ffind /k /m /e"o$"
foo

To catch EOL=CRLF, we could do the likes of this.

Code:
v:\> echo foo | ffind /k /m /e"o\r$"
foo

But that won't catch EOL=\0.

Code:
echos foo | ffind /k /m /e"o\r$"

One solution (?) is to use "\r*$" (zero or more CRs followed by a recognized EOL) instead of "$".

Code:
v:\> echos foo | ffind /k /m /e"o\r*$"
foo

v:\> echo foo | ffind /k /m /e"o\r*$"
foo

Have you got any other suggestions or workarounds?
 
Hmmm! It claims to support Windows (see Case 3 here). I don't know how that could be without treating CRLF as EOF.
 
Just some observations ... [if these tests are valid] ...

The behavior we're accustomed to (v32, Onigmo) is that the anchor '$' is matched by \0, LF, CRLF, CR (in order below).

Code:
v:\> echo %@regex["o$",foo]
1

v:\> echo %@regex["o$",foo%@char[10]]
1

v:\> echo %@regex["o$",foo%@char[13]%@char[10]]
1

v:\> echo %@regex["o$",foo%@char[13]]
1

Oniguruma (v33) seems to use only \0 and LF (1st and 2nd below).

Code:
v:\> echo %@regex["o$",foo]
1

v:\> echo %@regex["o$",foo%@char[10]]
1

v:\> echo %@regex["o$",foo%@char[13]%@char[10]]
0

v:\> echo %@regex["o$",foo%@char[13]]
0

Oniguruma has '\R' (so does Onigmo) but that is matched by LF, CRLF, and CR (2~4 below) and not by \0 (1 below), same in Onigmo.

Code:
v:\> echo %@regex["o\R",foo]
0

v:\> echo %@regex["o\R",foo%@char[10]]
1

v:\> echo %@regex["o\R",foo%@char[13]%@char[10]]
1

v:\> echo %@regex["o\R",foo%@char[13]]
1

FINDSTR's '$' is matched by \0, CR, CRLF (1~3 below), and not by LF (4 below).

Code:
v:\> echos foo | findstr o$
foo

v:\> echos foo^r | findstr o$
foo

v:\> echos foo^r^n | findstr o$
foo

v:\> echos foo^n | findstr o$

v:\>

The very inconvenient kludge I mentioned earlier works in all four cases (v33, Oniguruma).

Code:
v:\> echo %@regex["o\r*$",foo]
1

v:\> echo %@regex["o\r*$",foo%@char[10]]
1

v:\> echo %@regex["o\r*$",foo%@char[13]%@char[10]]
1

v:\> echo %@regex["o\r*$",foo%@char[13]]
1
 
Apparently, the fix is to un-comment this line in oniguruma-6.9.9\src\regenc.h, and rebuild the lib and dll.

Code:
#define USE_CRNL_AS_LINE_TERMINATOR

Having done that and with my DLL in place, I get expected/desired results.

Code:
v:\> echo %@regex["o$",foo]
1

v:\> echo foo | ffind /k /m /e"o$"
foo
 
Apparently, the fix is to un-comment this line in oniguruma-6.9.9\src\regenc.h, and rebuild the lib and dll.

Code:
#define USE_CRNL_AS_LINE_TERMINATOR

Having done that and with my DLL in place, I get expected/desired results.

Code:
v:\> echo %@regex["o$",foo]
1

v:\> echo foo | ffind /k /m /e"o$"
foo

Well - not exactly. That line is intended to switch support between Linux LF line endings and Mac OSX CR line endings. It will still have problems with Windows CR/LF line endings. But if you're only parsing a single line it will work.
 
Back
Top