@REGEX question

May 20, 2008
11,424
99
Syracuse, NY, USA
How am I to interpret the return value of @REGEX[] below? It doesn't seem to be the (documented) "number of matching groups".

Code:
e:\logs\mercury> echo %@REGEX["(efused)|(uthor)|(known)",refused]
4
 
May 20, 2008
11,424
99
Syracuse, NY, USA
How am I to interpret the return value of @REGEX[] below? It doesn't seem to be the (documented) "number of matching groups".

Code:
e:\logs\mercury> echo %@REGEX["(efused)|(uthor)|(known)",refused]
4

[Working with this Oniguruma stuff gives me a headache!]

This snippet (below) comes close to the documentation. It always gives a count of the matches. I replaced troublesome characters with GT, LT, GE, and LE.

Code:
    UChar    *mstart=(UChar*)szString,
            *mend=(UChar*)szString + 2 * lstrlen(szString);

    OnigRegion *region = onig_region_new();

    // here's the interesting stuff
    INT matches = 0, i;
    while ( onig_search(regex, mstart, mend, mstart, mend, region, 0) GE 0 )
    {
        matches += 1;

        // find the match and move past it
        // first see if the match was a group
        for ( i=1; i < region-GTnum_regs; i++ )
        {
            if ( region->beg[i] GE 0 ) // match was a group
            {
                mstart += region-GTend[i];
                break;    // keep looking (continue the while)    
            }
        }

        if ( i == region-GTnum_regs ) // match was not a group (region 0)
        {
            mstart += region-GTend[0];
        }
        // keep looking (continue the while)
    }
    Sprintf(psz, L"%d", matches);
Here are a few examples.

Code:
g:\projects\4utils\release> echo %@regex[o|g,doggiepoo]
5

g:\projects\4utils\release> echo %@regex[(oo)|(g),doggiepoo]
3

g:\projects\4utils\release> echo %@regex[(oo)|(gg),doggiepoo]
2

g:\projects\4utils\release> echo %@regex[(o)|(g),doggie]
3

g:\projects\4utils\release> echo %@regex[(o)|g,doggie]
3

g:\projects\4utils\release> echo %@regex[o|g,doggie]
3

g:\projects\4utils\release> echo %@regex[(s)|f,doggie]
0

g:\projects\4utils\release> echo %@regex[o|h,dog]
1

g:\projects\4utils\release> echo %@regex[(foo),foozzz]
1

g:\projects\4utils\release> echo %@regex[(foo),foozzzfoo]
2
 
May 20, 2008
11,424
99
Syracuse, NY, USA
You can shorten that by looping backwards so the region 0 match only gets counted if no group match was found.

Code:
    INT matches = 0;
    while ( onig_search(regex, mstart, mend, mstart, mend, region, 0) GE 0 )
    {
        matches += 1;

        for ( INT i = region-GTnum_regs-1; i GE 0; i-- )
        {
            if ( region->beg[i] GE 0 )
            {
                mstart += region-GTend[i];
                break;    
            }
        }
    }
    Sprintf(psz, L"%d", matches);
 

rconn

Administrator
Staff member
May 14, 2008
12,365
150
> How am I to interpret the return value of @REGEX[] below? It doesn't
> seem to be the (documented) "number of matching groups".
>
>
> Code:
> ---------
> e:\logs\mercury> echo %@REGEX["(efused)|(uthor)|(known)",refused]
> 4
> ---------

I tried that on several regular expression testers, and got results of 0, 1,
or 4, depending on the RE emulation desired.

So -- what are you trying to do, and what language syntax are you using?

Rex Conn
JP Software
 
May 20, 2008
11,424
99
Syracuse, NY, USA
On Sun, 11 Jul 2010 22:25:42 -0400, rconn <>
wrote:

|---Quote---
|> How am I to interpret the return value of @REGEX[] below? It doesn't
|> seem to be the (documented) "number of matching groups".
|>
|>
|> Code:
|> ---------
|> e:\logs\mercury> echo %@REGEX["(efused)|(uthor)|(known)",refused]
|> 4
|> ---------
|---End Quote---
|I tried that on several regular expression testers, and got results of 0, 1,
|or 4, depending on the RE emulation desired.
|
|So -- what are you trying to do, and what language syntax are you using?

I use PERL syntax. Your return value doesn't seem to depend on how
many are found. Are you returning region.num_regs? That's always the
number of parens (plus 1) in the regex. That's what it looks like
(see below). You have to loop to get all the matches.

Code:
v:\> echo %@regex[(a)|(b)|(c),cat]
4

v:\> echo %@regex[(a)|(b)|(c),ccaat]
4

v:\> echo %@regex[(a)|(b)|(c),cccaaat]
4

v:\> echo %@regex[(a)|(b)|(c)|(d),cccaaat]
5

v:\> echo %@regex[(a)|(b)|(c)|(d),ccaat]
5

v:\> echo %@regex[(a)|(b)|(c)|(d),cat]
5
 
May 20, 2008
11,424
99
Syracuse, NY, USA
On Sun, 11 Jul 2010 22:25:42 -0400, rconn <>
wrote:

|So -- what are you trying to do

I was just pointing out that, contrary to the help, @REGEX[] doesn't
return the number of matching groups. The code I posted (and the
complete version I emailed you) simply always returns the number of
matches. As far as counting matches is concerned, groups are not
significant; there are 3 matches here [a|b|c,cab] as well as here
[(a)|(b)|(c),cab] ... also here [(a|b|c),cab]. I'm not even sure
whether there's any point in using groups in a simple "find_a_match"
or "count_the_matches" function.
 
May 20, 2008
11,424
99
Syracuse, NY, USA
On Sun, 11 Jul 2010 22:25:42 -0400, rconn <>
wrote:

|So -- what are you trying to do

I was just pointing out that, contrary to the help, @REGEX[] doesn't
return the number of matching groups. The code I posted (and the
complete version I emailed you) simply always returns the number of
matches. As far as counting matches is concerned, groups are not
significant; there are 3 matches here [a|b|c,cab] as well as here
[(a)|(b)|(c),cab] ... also here [(a|b|c),cab]. I'm not even sure
whether there's any point in using groups in a simple "find_a_match"
or "count_the_matches" function.

Here's a simpler, faster, and much more intuitive (than code I posted earlier) way to count matches.

Code:
    UChar    *at = (UChar*) pString,
            *mend=(UChar*)pString + lstrlen(pString) * sizeof(WCHAR);
    INT        matches = 0,
            matchlen;

    while ( at < mend )
    {
        matchlen = onig_match(regex, (UChar*) pString, mend, at, NULL, option);
        if ( matchlen >= 0 )
        {
            matches += 1;
            at += matchlen;
        }
        else
        {
            at += 2;
        }
    }

    Sprintf(psz, L"%d", matches);
If you want to count matches you must plow through the string looking for subsequent ones. The onig_match function is a bit odd ... It checks to see if a match starts at "at". The parameter indicating the beginning of the whole string (pString, above) appears irrelevant; the function works even if that parameter is NULL or greater than "at"; it seems not used at all.
 
Similar threads
Thread starter Title Forum Replies Date
vefatica Regex question Support 5
F %@regex["^-","-a"] returns 0, "^-" =~ "-a" is false (no match) Support 4
JohnQSmith Regex renaming Support 2
vefatica TPIPE: unbalanced escaped quotes in a regex? Support 5
rps Regex problem: \xnn not recognized as a hex character Support 0
old coot Regex problem: \xnn not recognized as a hex character Support 12
R Regex using ^ Support 2
T Regex engine doesn't recognize native DOS line endings Support 2
P Simple RegEx copy Support 9
samintz WAD Regex Analyzer Support 1
D How to? Use typed envars using regex. Support 3
P Renaming with a RegEx Support 1
R How to? use @everything perl regex Support 2
C v18 regex help please Support 1
C Font of RegEx Analyzer Support 0
D Regex problem Support 17
mikea How to? Regex match when there shouldn't be (?) Support 18
JohnQSmith Fixed Copying with regex (several issues) Support 7
D Help needed to get a regex to work Support 3
thedave WAD Regex match on \h Support 5
Ville Regex & conditionals Support 9
samintz Regex Rename Support 2
vefatica @REGEX: behavior vs. documentation Support 2
vefatica @REGEX revisited Support 4
B Regex and Replace Support 6
Stefano Piccardi detecting BOM, FFIND multibyte regex Support 18
dcantor FFIND syntax -- is /E"regex" /X supported? Support 2
P Renaming files with regex. Support 6
B "Fun" with DO and Regex Support 12
P Need to use a regex in a "for" loop. Support 54
C forum Posting Question... Support 3
C question re: Move Support 3
S How to? Upgrade Question: What supporting documentation is required at time of purchase? Support 2
Peter Murschall Documentation Question to %@PSHELL Support 4
Fross Tab Question Support 6
Fross Quick Function Question Support 17
Dick Johnson Question about the Touch command Support 0
rps Documentation "Copy+Paste+run" question Support 2
vefatica SFTP question Support 17
vefatica Question about IPWorks Support 0
C How to? SHORTCUT question..... Support 6
vefatica TPIPE, crash and question Support 1
C Question / Suggestion Support 1
MickeyF Another TPIPE question Support 6
vefatica Another @EVERYTHING question Support 4
mikea How to? %@everything[] question Support 10
A License Question for Single User Support 5
Jay Sage Help Correction (and Related Question) Support 0
S Elapsed time in TCC prompt question Support 0
H command line parsing question Support 5

Similar threads