1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Onig discrepancy

Discussion in 'Support' started by vefatica, Dec 25, 2010.

  1. vefatica

    Joined:
    May 20, 2008
    Messages:
    7,935
    Likes Received:
    30
    My plugins are on a volume mounted on various "plugins" directories. So both TCC v11 and v12 use the same 4UTILS.DLL. I use the PERL syntax. I see this with v11:

    Code:
    v:\> ver & echo 123 | grepp 3$
    
    TCC  11.00.52   Windows XP [Version 5.1.2600]
    123
    
    v:\>
    And I see this with v12:

    Code:
    v:\> ver & echo 123 | grepp 3$
    
    TCC  12.00.42   Windows XP [Version 5.1.2600]
    
    v:\>
    That is, with v11, "3" followed by EOL is found, with v12, it's not.

    Apparently, the newest ONIG.DLL is faulty since the problem follows the DLL (v12 gets it right with the older DLL, v11 gets it wrong with the newer DLL).

    More testing shows that the newer (bad) DLL recognizes "\n" as EOL, but not "\r" or }\r\n":

    Code:
    v:\> echos 123^n | grepp 3$
    123
    
    v:\> echos 123^r | grepp 3$
    
    v:\> echos 123^r^n | grepp 3$
    
    v:\>
    That would seem a horrible bug in the Onig code, one that wouldn't last long. Could it be in the building or initializing of the DLL?

    Note that 4UTILS.DLL sets the default regex syntax (as below, and correctly, verified) and calls onig_search() with ONIG_ENCODING_UTF16_LE and ONIG_SYNTAX_DEFAULT.

    Code:
    VOID GetRegexSyntax ( VOID )
    {
        WCHAR szResponse[16];
        OnigSyntaxType *psyntax = ONIG_SYNTAX_PERL;
        if ( !QueryOptionValue( L"RegularExpressions", szResponse ) )
            switch ( CharUpper(szResponse)[1] )
            {
                case L'E' :    psyntax = ONIG_SYNTAX_PERL;                break;
                case L'U' :    psyntax = ONIG_SYNTAX_RUBY;                break;
                case L'A' :    psyntax = ONIG_SYNTAX_JAVA;                break;
                case L'R' :    psyntax = ONIG_SYNTAX_GREP;                break;
                case L'O' :    psyntax = ONIG_SYNTAX_POSIX_EXTENDED;    break;
                case L'N' :    psyntax = ONIG_SYNTAX_GNU_REGEX;
            }
    
        onig_set_default_syntax(psyntax);
    }
     
  2. vefatica

    Joined:
    May 20, 2008
    Messages:
    7,935
    Likes Received:
    30
    This pared-down function produces one result (a positive one) using the older (Aug 2010) ONIG.DLL and the other result ("not found") using the newer (Dec 2010) ONIG.DLL.

    Code:
    INT test(VOID)
    {
        WCHAR    buf[8192] = L"123\r\n",
                szRegEx[1024] = L"3$";
        INT SearchResult, ec, rc = 0;
    
        regex_t *regex;
        ec = onig_new(&regex, (UChar*) szRegEx, (UChar*) szRegEx+2*lstrlen(szRegEx),
            ONIG_OPTION_NONE, ONIG_ENCODING_UTF16_LE, ONIG_SYNTAX_DEFAULT, NULL);
        if ( ec != ONIG_NORMAL)
        {
            return onig_error_msg(ec);
        }
    
        UChar    *mstart = (UChar*) buf,
                *mend = (UChar*) buf + 2*lstrlen((WCHAR*)buf);
    
        SearchResult = onig_search(regex, mstart, mend, mstart, mend, NULL, 0);
    
        Printf(L"%s\r\n", SearchResult >= 0 ? buf : L"not found");
    
        onig_free(regex);
        return rc;
    }
     
  3. rconn

    rconn Administrator
    Staff Member

    Joined:
    May 14, 2008
    Messages:
    9,854
    Likes Received:
    83
    Nothing has changed in the Onig binaries since May 2010.
     
  4. rconn

    rconn Administrator
    Staff Member

    Joined:
    May 14, 2008
    Messages:
    9,854
    Likes Received:
    83
    Not reproducible here with 12.0.42:

    [D:\onig\onig-5.9.1\x32]ver & echo 123 | grep 3$

    TCC 12.00.42 Windows 7 [Version 6.1.7600]
    123
     
  5. vefatica

    Joined:
    May 20, 2008
    Messages:
    7,935
    Likes Received:
    30
    On Sat, 25 Dec 2010 21:54:14 -0500, rconn <> wrote:

    |Not reproducible here with 12.0.42:
    |
    |[D:\onig\onig-5.9.1\x32]ver & echo 123 | grep 3$
    |
    |TCC 12.00.42 Windows 7 [Version 6.1.7600]
    |123

    I was using the 4UTILS plugin GREPP (which uses Onig). Were you?

    See my simplified "C" example.
     
  6. vefatica

    Joined:
    May 20, 2008
    Messages:
    7,935
    Likes Received:
    30
    On Sat, 25 Dec 2010 21:52:11 -0500, rconn <> wrote:

    |---Quote---
    |> This pared-down function produces one result (a positive one) using the
    |> older (Aug 2010) ONIG.DLL and the other result ("not found") using the
    |> newer (Dec 2010) ONIG.DLL.
    |---End Quote---
    |Nothing has changed in the Onig binaries since May 2010.

    The distributed DLLs are definitely different.

    d:\tc11> d /k /m o*
    2010-08-07 21:39 299,392 onig.dll

    d:\tc11> d:\tc12

    d:\tc12> d /k /m o*
    2010-12-14 12:30 293,616 onig.dll
     
  7. rconn

    rconn Administrator
    Staff Member

    Joined:
    May 14, 2008
    Messages:
    9,854
    Likes Received:
    83
    The v11 DLL is from July 2009. The v12 DLL is from May 2010.
     
  8. vefatica

    Joined:
    May 20, 2008
    Messages:
    7,935
    Likes Received:
    30
    On Sat, 25 Dec 2010 23:33:32 -0500, rconn <> wrote:

    |---Quote---
    |> |Nothing has changed in the Onig binaries since May 2010.
    |>
    |> The distributed DLLs are definitely different.
    |---End Quote---
    |The v11 DLL is from July 2009. The v12 DLL is from May 2010.

    The newer one does not work correctly. At least it doesn't work like the older
    one (and v10's).
     
  9. vefatica

    Joined:
    May 20, 2008
    Messages:
    7,935
    Likes Received:
    30
    Here's more evidence of different behavior from the two ONIG.DLLs. Both tests were conducted with v12 and they do not involve any plugins.

    With the newer (v12 distribution) ONIG.DLL:

    Code:
    v:\> set str=`123%@char[13]%@char[10]`
    
    v:\> echo %@regex[3$,%str]
    0
    With the older (v11 distribution) ONIG.DLL:

    Code:
    v:\> set str=`123%@char[13]%@char[10]`
    
    v:\> echo %@regex[3$,%str]
    1
     
  10. rconn

    rconn Administrator
    Staff Member

    Joined:
    May 14, 2008
    Messages:
    9,854
    Likes Received:
    83
    The v12 Onig is 5.9.2. The v11 Onig is 5.9.1. I have no intention of
    rewriting 5.9.2 or of reverting to 5.9.1.

    Rex Conn
    JP Software
     
  11. vefatica

    Joined:
    May 20, 2008
    Messages:
    7,935
    Likes Received:
    30
    On Sun, 26 Dec 2010 23:10:39 -0500, rconn <> wrote:

    |---Quote---
    |> Here's more evidence of different behavior from the two ONIG.DLLs. Both
    |> tests were conducted with v12 and they do not involve any plugins.
    |---End Quote---
    |The v12 Onig is 5.9.2. The v11 Onig is 5.9.1. I have no intention of
    |rewriting 5.9.2 or of reverting to 5.9.1.

    Do you intend to do anything about the fact that it doesn't work correctly?

    Do you build the Oniguruma binaries?
     
  12. vefatica

    Joined:
    May 20, 2008
    Messages:
    7,935
    Likes Received:
    30
    I have built both 5.9.1 and 5.9.2. When I use either of those DLLs and a stand-alone test EXE, the regular expression "3$" is not found in the string "123\r\n" ... even using ONIG_ENCODING_ASCII and either of ONIG_SYNTAX_RUBY or ONIG_SYNTAX_PERL.

    The regular expression **is** found when I run my test EXE with **your** 5.9.1 (TCMD v11) DLL in place.

    I changed none of the source or config files when I built the DLLs. Is it possible that there's some build-time option that has been overlooked? Have you any ideas at all?
     
  13. rconn

    rconn Administrator
    Staff Member

    Joined:
    May 14, 2008
    Messages:
    9,854
    Likes Received:
    83
    correctly?

    I do not. I have no interest (or ability) to support third-party dll's in
    plugins.

    Onig.dll is included solely for Take Command's internal use.

    Rex Conn
    JP Software
     
  14. vefatica

    Joined:
    May 20, 2008
    Messages:
    7,935
    Likes Received:
    30
    On Mon, 27 Dec 2010 09:09:30 -0500, rconn <> wrote:

    |I do not. I have no interest (or ability) to support third-party dll's in
    |plugins.
    |
    |Onig.dll is included solely for Take Command's internal use.

    What about @REGEX[]?
     
  15. rconn

    rconn Administrator
    Staff Member

    Joined:
    May 14, 2008
    Messages:
    9,854
    Likes Received:
    83
    What about it? It's an internal function that uses Onig for strings, not
    files.
     
  16. vefatica

    Joined:
    May 20, 2008
    Messages:
    7,935
    Likes Received:
    30
    On Mon, 27 Dec 2010 09:09:30 -0500, rconn <> wrote:

    |I do not. I have no interest (or ability) to support third-party dll's in
    |plugins.
    |
    |Onig.dll is included solely for Take Command's internal use.

    How about FFIND? Do you support its working correctly? It doesn't in v12.
    FFIND /E does not correctly find EOL anchors.

    v11:

    Code:
    v:\> echo 123 > testfile.txt
    
    v:\> ffind /E"3$" testfile.txt
    
    ---- V:\testfile.txt
    123
    
      1 line in      1 file
    v12:

    Code:
    v:\> echo 123 > testfile.txt
    
    v:\> ffind /E"3$" testfile.txt
    
      0 lines in      0 files
    Again, the failure follows the version of ONIG.DLL. Not being to use EOL
    anchors severely limits the use of regular expressions.
     
  17. vefatica

    Joined:
    May 20, 2008
    Messages:
    7,935
    Likes Received:
    30
    On Mon, 27 Dec 2010 10:45:04 -0500, rconn <> wrote:

    |What about it? It's an internal function that uses Onig for strings, not
    |files.

    It doesn't work correctly. And FFIND /E doesn't work correctly in v12 (see my
    very recent post).
     
  18. rconn

    rconn Administrator
    Staff Member

    Joined:
    May 14, 2008
    Messages:
    9,854
    Likes Received:
    83
    (see my

    IMO this is an imaginary problem for you, because you use GREP, not FFIND.

    As I said before, I am not going to rewrite or replace Onig.
     
  19. vefatica

    Joined:
    May 20, 2008
    Messages:
    7,935
    Likes Received:
    30
    On Mon, 27 Dec 2010 11:17:28 -0500, rconn <> wrote:

    |IMO this is an imaginary problem for you, because you use GREP, not FFIND.

    In fact I do sometimes use FFIND. What about users who use it exclusively?

    I don't understand your reluctance to investigate this. The ONIG.DLL
    distributed with v11 did (does) things correctly. I presume you built it (if
    not, where did you get it? ... if so, can you zip up the onig5.9.1 project
    directory and send it to me?). The ONIG.DLL distributed with v12 does not do
    things correctly, breaking @REGEX[], FFIND, and probably other internals. TCC
    claims to support regular expressions in many places and they're broken.
     
  20. Peter Bratton

    Joined:
    Jul 1, 2008
    Messages:
    81
    Likes Received:
    0
    For what it's worth, it detects \n and not \r as its EOL anchor.. I get the same result as you above, but:

    Code:
    C:\> echos 123^n>testfile.txt
    
    C:\> ffind /E"3$" testfile.txt
    
    ---- C:\btm\testfile.txt
    123
    
      1 line in      1 file
    
    --
    Peter
     

Share This Page