Onig discrepancy

  • This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn more.
#1
My plugins are on a volume mounted on various "plugins" directories. So both TCC v11 and v12 use the same 4UTILS.DLL. I use the PERL syntax. I see this with v11:

Code:
v:\> ver & echo 123 | grepp 3$

TCC  11.00.52   Windows XP [Version 5.1.2600]
123

v:\>
And I see this with v12:

Code:
v:\> ver & echo 123 | grepp 3$

TCC  12.00.42   Windows XP [Version 5.1.2600]

v:\>
That is, with v11, "3" followed by EOL is found, with v12, it's not.

Apparently, the newest ONIG.DLL is faulty since the problem follows the DLL (v12 gets it right with the older DLL, v11 gets it wrong with the newer DLL).

More testing shows that the newer (bad) DLL recognizes "\n" as EOL, but not "\r" or }\r\n":

Code:
v:\> echos 123^n | grepp 3$
123

v:\> echos 123^r | grepp 3$

v:\> echos 123^r^n | grepp 3$

v:\>
That would seem a horrible bug in the Onig code, one that wouldn't last long. Could it be in the building or initializing of the DLL?

Note that 4UTILS.DLL sets the default regex syntax (as below, and correctly, verified) and calls onig_search() with ONIG_ENCODING_UTF16_LE and ONIG_SYNTAX_DEFAULT.

Code:
VOID GetRegexSyntax ( VOID )
{
    WCHAR szResponse[16];
    OnigSyntaxType *psyntax = ONIG_SYNTAX_PERL;
    if ( !QueryOptionValue( L"RegularExpressions", szResponse ) )
        switch ( CharUpper(szResponse)[1] )
        {
            case L'E' :    psyntax = ONIG_SYNTAX_PERL;                break;
            case L'U' :    psyntax = ONIG_SYNTAX_RUBY;                break;
            case L'A' :    psyntax = ONIG_SYNTAX_JAVA;                break;
            case L'R' :    psyntax = ONIG_SYNTAX_GREP;                break;
            case L'O' :    psyntax = ONIG_SYNTAX_POSIX_EXTENDED;    break;
            case L'N' :    psyntax = ONIG_SYNTAX_GNU_REGEX;
        }

    onig_set_default_syntax(psyntax);
}
 
#2
This pared-down function produces one result (a positive one) using the older (Aug 2010) ONIG.DLL and the other result ("not found") using the newer (Dec 2010) ONIG.DLL.

Code:
INT test(VOID)
{
    WCHAR    buf[8192] = L"123\r\n",
            szRegEx[1024] = L"3$";
    INT SearchResult, ec, rc = 0;

    regex_t *regex;
    ec = onig_new(&regex, (UChar*) szRegEx, (UChar*) szRegEx+2*lstrlen(szRegEx),
        ONIG_OPTION_NONE, ONIG_ENCODING_UTF16_LE, ONIG_SYNTAX_DEFAULT, NULL);
    if ( ec != ONIG_NORMAL)
    {
        return onig_error_msg(ec);
    }

    UChar    *mstart = (UChar*) buf,
            *mend = (UChar*) buf + 2*lstrlen((WCHAR*)buf);

    SearchResult = onig_search(regex, mstart, mend, mstart, mend, NULL, 0);

    Printf(L"%s\r\n", SearchResult >= 0 ? buf : L"not found");

    onig_free(regex);
    return rc;
}
 

rconn

Administrator
Staff member
May 14, 2008
10,096
85
#4
v:\>[/CODE]And I see this with v12:

Code:
v:\> ver & echo 123 | grepp 3$
 
TCC  12.00.42   Windows XP [Version 5.1.2600]
 
v:\>
That is, with v11, "3" followed by EOL is found, with v12, it's not.
Not reproducible here with 12.0.42:

[D:\onig\onig-5.9.1\x32]ver & echo 123 | grep 3$

TCC 12.00.42 Windows 7 [Version 6.1.7600]
123
 
#5
On Sat, 25 Dec 2010 21:54:14 -0500, rconn <> wrote:

|Not reproducible here with 12.0.42:
|
|[D:\onig\onig-5.9.1\x32]ver & echo 123 | grep 3$
|
|TCC 12.00.42 Windows 7 [Version 6.1.7600]
|123

I was using the 4UTILS plugin GREPP (which uses Onig). Were you?

See my simplified "C" example.
 
#6
On Sat, 25 Dec 2010 21:52:11 -0500, rconn <> wrote:

|---Quote---
|> This pared-down function produces one result (a positive one) using the
|> older (Aug 2010) ONIG.DLL and the other result ("not found") using the
|> newer (Dec 2010) ONIG.DLL.
|---End Quote---
|Nothing has changed in the Onig binaries since May 2010.

The distributed DLLs are definitely different.

d:\tc11> d /k /m o*
2010-08-07 21:39 299,392 onig.dll

d:\tc11> d:\tc12

d:\tc12> d /k /m o*
2010-12-14 12:30 293,616 onig.dll
 
#8
On Sat, 25 Dec 2010 23:33:32 -0500, rconn <> wrote:

|---Quote---
|> |Nothing has changed in the Onig binaries since May 2010.
|>
|> The distributed DLLs are definitely different.
|---End Quote---
|The v11 DLL is from July 2009. The v12 DLL is from May 2010.

The newer one does not work correctly. At least it doesn't work like the older
one (and v10's).
 
#9
Here's more evidence of different behavior from the two ONIG.DLLs. Both tests were conducted with v12 and they do not involve any plugins.

With the newer (v12 distribution) ONIG.DLL:

Code:
v:\> set str=`123%@char[13]%@char[10]`

v:\> echo %@regex[3$,%str]
0
With the older (v11 distribution) ONIG.DLL:

Code:
v:\> set str=`123%@char[13]%@char[10]`

v:\> echo %@regex[3$,%str]
1
 
#11
On Sun, 26 Dec 2010 23:10:39 -0500, rconn <> wrote:

|---Quote---
|> Here's more evidence of different behavior from the two ONIG.DLLs. Both
|> tests were conducted with v12 and they do not involve any plugins.
|---End Quote---
|The v12 Onig is 5.9.2. The v11 Onig is 5.9.1. I have no intention of
|rewriting 5.9.2 or of reverting to 5.9.1.

Do you intend to do anything about the fact that it doesn't work correctly?

Do you build the Oniguruma binaries?
 
#12
I have built both 5.9.1 and 5.9.2. When I use either of those DLLs and a stand-alone test EXE, the regular expression "3$" is not found in the string "123\r\n" ... even using ONIG_ENCODING_ASCII and either of ONIG_SYNTAX_RUBY or ONIG_SYNTAX_PERL.

The regular expression **is** found when I run my test EXE with **your** 5.9.1 (TCMD v11) DLL in place.

I changed none of the source or config files when I built the DLLs. Is it possible that there's some build-time option that has been overlooked? Have you any ideas at all?
 
#14
On Mon, 27 Dec 2010 09:09:30 -0500, rconn <> wrote:

|I do not. I have no interest (or ability) to support third-party dll's in
|plugins.
|
|Onig.dll is included solely for Take Command's internal use.

What about @REGEX[]?
 
#16
On Mon, 27 Dec 2010 09:09:30 -0500, rconn <> wrote:

|I do not. I have no interest (or ability) to support third-party dll's in
|plugins.
|
|Onig.dll is included solely for Take Command's internal use.

How about FFIND? Do you support its working correctly? It doesn't in v12.
FFIND /E does not correctly find EOL anchors.

v11:

Code:
v:\> echo 123 > testfile.txt

v:\> ffind /E"3$" testfile.txt

---- V:\testfile.txt
123

  1 line in      1 file
v12:

Code:
v:\> echo 123 > testfile.txt

v:\> ffind /E"3$" testfile.txt

  0 lines in      0 files
Again, the failure follows the version of ONIG.DLL. Not being to use EOL
anchors severely limits the use of regular expressions.
 
#17
On Mon, 27 Dec 2010 10:45:04 -0500, rconn <> wrote:

|What about it? It's an internal function that uses Onig for strings, not
|files.

It doesn't work correctly. And FFIND /E doesn't work correctly in v12 (see my
very recent post).
 
#19
On Mon, 27 Dec 2010 11:17:28 -0500, rconn <> wrote:

|IMO this is an imaginary problem for you, because you use GREP, not FFIND.

In fact I do sometimes use FFIND. What about users who use it exclusively?

I don't understand your reluctance to investigate this. The ONIG.DLL
distributed with v11 did (does) things correctly. I presume you built it (if
not, where did you get it? ... if so, can you zip up the onig5.9.1 project
directory and send it to me?). The ONIG.DLL distributed with v12 does not do
things correctly, breaking @REGEX[], FFIND, and probably other internals. TCC
claims to support regular expressions in many places and they're broken.
 
Jul 1, 2008
81
0
70
Montreal
#20
v12:

Code:
v:\> echo 123 > testfile.txt

v:\> ffind /E"3$" testfile.txt

  0 lines in      0 files
Again, the failure follows the version of ONIG.DLL. Not being to use EOL
anchors severely limits the use of regular expressions.
For what it's worth, it detects \n and not \r as its EOL anchor.. I get the same result as you above, but:

Code:
C:\> echos 123^n>testfile.txt

C:\> ffind /E"3$" testfile.txt

---- C:\btm\testfile.txt
123

  1 line in      1 file
--
Peter