1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

@REGEX question

Discussion in 'Support' started by vefatica, Jul 10, 2010.

  1. vefatica

    Joined:
    May 20, 2008
    Messages:
    7,959
    Likes Received:
    30
    How am I to interpret the return value of @REGEX[] below? It doesn't seem to be the (documented) "number of matching groups".

    Code:
    e:\logs\mercury> echo %@REGEX["(efused)|(uthor)|(known)",refused]
    4
     
  2. vefatica

    Joined:
    May 20, 2008
    Messages:
    7,959
    Likes Received:
    30
    [Working with this Oniguruma stuff gives me a headache!]

    This snippet (below) comes close to the documentation. It always gives a count of the matches. I replaced troublesome characters with GT, LT, GE, and LE.

    Code:
        UChar    *mstart=(UChar*)szString,
                *mend=(UChar*)szString + 2 * lstrlen(szString);
    
        OnigRegion *region = onig_region_new();
    
        // here's the interesting stuff
        INT matches = 0, i;
        while ( onig_search(regex, mstart, mend, mstart, mend, region, 0) GE 0 )
        {
            matches += 1;
    
            // find the match and move past it
            // first see if the match was a group
            for ( i=1; i < region-GTnum_regs; i++ )
            {
                if ( region->beg[i] GE 0 ) // match was a group
                {
                    mstart += region-GTend[i];
                    break;    // keep looking (continue the while)    
                }
            }
    
            if ( i == region-GTnum_regs ) // match was not a group (region 0)
            {
                mstart += region-GTend[0];
            }
            // keep looking (continue the while)
        }
        Sprintf(psz, L"%d", matches);
    Here are a few examples.

    Code:
    g:\projects\4utils\release> echo %@regex[o|g,doggiepoo]
    5
    
    g:\projects\4utils\release> echo %@regex[(oo)|(g),doggiepoo]
    3
    
    g:\projects\4utils\release> echo %@regex[(oo)|(gg),doggiepoo]
    2
    
    g:\projects\4utils\release> echo %@regex[(o)|(g),doggie]
    3
    
    g:\projects\4utils\release> echo %@regex[(o)|g,doggie]
    3
    
    g:\projects\4utils\release> echo %@regex[o|g,doggie]
    3
    
    g:\projects\4utils\release> echo %@regex[(s)|f,doggie]
    0
    
    g:\projects\4utils\release> echo %@regex[o|h,dog]
    1
    
    g:\projects\4utils\release> echo %@regex[(foo),foozzz]
    1
    
    g:\projects\4utils\release> echo %@regex[(foo),foozzzfoo]
    2
     
  3. vefatica

    Joined:
    May 20, 2008
    Messages:
    7,959
    Likes Received:
    30
    You can shorten that by looping backwards so the region 0 match only gets counted if no group match was found.

    Code:
        INT matches = 0;
        while ( onig_search(regex, mstart, mend, mstart, mend, region, 0) GE 0 )
        {
            matches += 1;
    
            for ( INT i = region-GTnum_regs-1; i GE 0; i-- )
            {
                if ( region->beg[i] GE 0 )
                {
                    mstart += region-GTend[i];
                    break;    
                }
            }
        }
        Sprintf(psz, L"%d", matches);
     
  4. rconn

    rconn Administrator
    Staff Member

    Joined:
    May 14, 2008
    Messages:
    9,860
    Likes Received:
    83
    I tried that on several regular expression testers, and got results of 0, 1,
    or 4, depending on the RE emulation desired.

    So -- what are you trying to do, and what language syntax are you using?

    Rex Conn
    JP Software
     
  5. vefatica

    Joined:
    May 20, 2008
    Messages:
    7,959
    Likes Received:
    30
    On Sun, 11 Jul 2010 22:25:42 -0400, rconn <>
    wrote:

    |---Quote---
    |> How am I to interpret the return value of @REGEX[] below? It doesn't
    |> seem to be the (documented) "number of matching groups".
    |>
    |>
    |> Code:
    |> ---------
    |> e:\logs\mercury> echo %@REGEX["(efused)|(uthor)|(known)",refused]
    |> 4
    |> ---------
    |---End Quote---
    |I tried that on several regular expression testers, and got results of 0, 1,
    |or 4, depending on the RE emulation desired.
    |
    |So -- what are you trying to do, and what language syntax are you using?

    I use PERL syntax. Your return value doesn't seem to depend on how
    many are found. Are you returning region.num_regs? That's always the
    number of parens (plus 1) in the regex. That's what it looks like
    (see below). You have to loop to get all the matches.

    Code:
    v:\> echo %@regex[(a)|(b)|(c),cat]
    4
    
    v:\> echo %@regex[(a)|(b)|(c),ccaat]
    4
    
    v:\> echo %@regex[(a)|(b)|(c),cccaaat]
    4
    
    v:\> echo %@regex[(a)|(b)|(c)|(d),cccaaat]
    5
    
    v:\> echo %@regex[(a)|(b)|(c)|(d),ccaat]
    5
    
    v:\> echo %@regex[(a)|(b)|(c)|(d),cat]
    5
     
  6. vefatica

    Joined:
    May 20, 2008
    Messages:
    7,959
    Likes Received:
    30
    On Sun, 11 Jul 2010 22:25:42 -0400, rconn <>
    wrote:

    |So -- what are you trying to do

    I was just pointing out that, contrary to the help, @REGEX[] doesn't
    return the number of matching groups. The code I posted (and the
    complete version I emailed you) simply always returns the number of
    matches. As far as counting matches is concerned, groups are not
    significant; there are 3 matches here [a|b|c,cab] as well as here
    [(a)|(b)|(c),cab] ... also here [(a|b|c),cab]. I'm not even sure
    whether there's any point in using groups in a simple "find_a_match"
    or "count_the_matches" function.
     
  7. vefatica

    Joined:
    May 20, 2008
    Messages:
    7,959
    Likes Received:
    30
    Here's a simpler, faster, and much more intuitive (than code I posted earlier) way to count matches.

    Code:
        UChar    *at = (UChar*) pString,
                *mend=(UChar*)pString + lstrlen(pString) * sizeof(WCHAR);
        INT        matches = 0,
                matchlen;
    
        while ( at < mend )
        {
            matchlen = onig_match(regex, (UChar*) pString, mend, at, NULL, option);
            if ( matchlen >= 0 )
            {
                matches += 1;
                at += matchlen;
            }
            else
            {
                at += 2;
            }
        }
    
        Sprintf(psz, L"%d", matches);
    If you want to count matches you must plow through the string looking for subsequent ones. The onig_match function is a bit odd ... It checks to see if a match starts at "at". The parameter indicating the beginning of the whole string (pString, above) appears irrelevant; the function works even if that parameter is NULL or greater than "at"; it seems not used at all.
     

Share This Page