@REGEXSUB issue

May 31, 2008
382
2
It seems that @REGEXSUB handles regex alternatives incorrectly.
Code:
C:\> for %i in (a b c) echo %i in (a+)^|(b+): %@REGEXINDEX["(a+)|(b+)",x%i] (%@REGEXSUB[1,"(a+)|(b+)",x%i])

a in (a+)|(b+): 1 (a)
b in (a+)|(b+): 1 ()          <=== should match "b"
c in (a+)|(b+): -1 ()

C:\> ver
TCC  9.02.152   Windows XP [Version 5.1.2600]
For comparison, @REGEXINDEX matches the second capture in the second line, (b+), while @REGEXSUB doesn't.
 
May 20, 2008
11,530
102
Syracuse, NY, USA
On Mon, 23 Mar 2009 17:04:59 -0500, Stefano Piccardi <>
wrote:

|No answer? Then does @REGEXSUB work correctly in version 10?

It would seem it's not working correctly. In writing @XREPLACE (4UTILS) I did
not go out of my way to accommodate this particular scenario. Alternatives
(Perl syntax) seem to be OK as far as Oniguruma is concerned:

v:\> echo %@xreplace[(a+)|(b+),z,xaax]
xzx

v:\> echo %@xreplace[(a+)|(b+),z,xbbx]
xzx

v:\> echo %@xreplace[(a+)|(b+),z,xccx]
xccx
--
- Vince
 
May 31, 2008
382
2
Thank you for confirming this issue in @REGEXSUB.
The regex is given to me as part of a configuration file, so I can't rewrite it to work around REGEXSUB.
I need to isolate the content of the capture.
I suppose that I could do it with @XREPLACE + @WORD, but I noticed a similar issue in @XREPLACE

Run in batch file: for %i in (aa bb cc) echo %i in (a+)^|(b+): (%@xreplace["(a+)|(b+)",z\1z,x%i])

output:
aa in (a+)|(b+): (xzaaz)
bb in (a+)|(b+): (xzz)
cc in (a+)|(b+): (xcc)

The second line should be (xzbbz).

BTW, if I don't quote the regex I get a different, still incorrect, output:

aa in (a+)|(b+): (xaa)
bb in (a+)|(b+): (xbb)
cc in (a+)|(b+): (xcc)
 
May 20, 2008
11,530
102
Syracuse, NY, USA
On Tue, 24 Mar 2009 09:44:59 -0500, Stefano Piccardi <>
wrote:

|for %i in (aa bb cc) echo %i in (a+)^|(b+): (%@xreplace["(a+)|(b+)",z\1z,x%i])
|BTW, if I don't quote the regex I get a different, still incorrect, output:
|
|aa in (a+)|(b+): (xaa)
|bb in (a+)|(b+): (xbb)
|cc in (a+)|(b+): (xcc)

These should be:

v:\> for %i in (aa bb cc) echo %i in (a+)^|(b+): (%@xreplace["(a+)|(b+)",z\1z,x%
i])
aa in (a+)|(b+): (xzaaz)
bb in (a+)|(b+): (xzz) [\1 not found]
cc in (a+)|(b+): (xcc)

v:\> for %i in (aa bb cc) echo %i in (a+)^|(b+): (%@xreplace["(a+)|(b+)",z\2z,x%
i])
aa in (a+)|(b+): (xzz) [\2 not found]
bb in (a+)|(b+): (xzbbz)
cc in (a+)|(b+): (xcc)

I found that bug last night while experimenting after reading your post. It
resulted from a change (quite a while back) from wcscpyn() to lstrncpy() which
behaves a little differently. There's a new one in the VC9 plugin directory on
lucky.syr.edu.

Please check out TYPEX (like TYPE /X) and UNTYPEX. If you redirect TYPEX to a
file UNTYPEX will re-construct the original file from the hex values. Don't
overwrite anything important. UNTYPEX needs the exact format that TYPEX outputs.
If you edit TYPEX's output, do it carefully. These are experimental. Example:

v:\> typex fleas.txt
00000000 4D 79 20 64 6F 67 20 68 61 73 20 66 6C 65 61 73 My dog has fleas
00000010 21 0D 0A !..

v:\> typex fleas.txt > fleas.hex

v:\> edit fleas.hex & rem change 61 to 41

v:\> untypex fleas.hex fleas2.txt

v:\> typex fleas2.txt
00000000 4D 79 20 64 6F 67 20 68 41 73 20 66 6C 65 41 73 My dog hAs fleAs
00000010 21 0D 0A !..
--
- Vince
 
May 31, 2008
382
2
These should be:

v:\> for %i in (aa bb cc) echo %i in (a+)^|(b+): (%@xreplace["(a+)|(b+)",z\1z,x%
i])
aa in (a+)|(b+): (xzaaz)
bb in (a+)|(b+): (xzz) [\1 not found]
cc in (a+)|(b+): (xcc)

v:\> for %i in (aa bb cc) echo %i in (a+)^|(b+): (%@xreplace["(a+)|(b+)",z\2z,x%
i])
aa in (a+)|(b+): (xzz) [\2 not found]
bb in (a+)|(b+): (xzbbz)
cc in (a+)|(b+): (xcc)
I weakly disagree. IMO in an alternate each capture should count as \1, so the output should be
(xzaaz)
(xzbbz)
(xcc)
But I hesitate to make a stronger point because this level of detail is left open to interpretation in the regex documentation.

However, if in an alternate the first capture is sometimes called \1 and other times it's called \2 then we're out of luck. Consider:
prefix A(\d+)|prefix B(\w+) should capture \1 as
a number when it's prefixed by prefix A or as a word when it's prefixed by prefix B. It can't be rewritten as
prefix A|prefix B(\d+|\w+)
to lock the alternate capture into the same group of parentheses. The second regex is not equivalent to the first one.

IMO the cardinal number of the capture should be assigned as the match is being evaluated, not as it is being compiled from left to right.
 
May 20, 2008
11,530
102
Syracuse, NY, USA
On Tue, 24 Mar 2009 13:13:31 -0500, Stefano Piccardi <>
wrote:

|v:\> for %i in (aa bb cc) echo %i in (a+)^|(b+): (%@xreplace["(a+)|(b+)",z\2z,x%
|i])
|aa in (a+)|(b+): (xzz) [\2 not found]
|bb in (a+)|(b+): (xzbbz)
|cc in (a+)|(b+): (xcc)

|I weakly disagree. IMO in an alternate each capture should count as \1, so the output should be
|(xzaaz)
|(xzbbz)
|(xcc)
|But I hesitate to make a stronger point because this level of detail is left open to interpretation in the regex documentation.

It's always been clear to me. \1, \2, ... refer to the parenthesized
expressions in order. To have it your way, do this:

for %i in (aa bb cc) echo %i in (a+)^|(b+): (%@xreplace["(a+|b+)",z\1z,x%i])
aa in (a+)|(b+): (xzaaz)
bb in (a+)|(b+): (xzbbz)
cc in (a+)|(b+): (xcc)
--
- Vince
 
May 31, 2008
382
2
It's always been clear to me. \1, \2, ... refer to the parenthesized
expressions in order. To have it your way, do this:

for %i in (aa bb cc) echo %i in (a+)^|(b+): (%@xreplace["(a+|b+)",z\1z,x%i])
aa in (a+)|(b+): (xzaaz)
bb in (a+)|(b+): (xzbbz)
cc in (a+)|(b+): (xcc)
--
- Vince
However, if in an alternate the first capture is sometimes called \1 and other times it's called \2 then we're out of luck. Consider:
prefix A(\d+)|prefix B(\w+) should capture \1 as
a number when it's prefixed by prefix A or as a word when it's prefixed by prefix B. It can't be rewritten as
prefix A|prefix B(\d+|\w+)
to lock the alternate capture into the same group of parentheses. The second regex is not equivalent to the first one.

IMO the cardinal number of the capture should be assigned as the match is being evaluated, not as it is being compiled from left to right.
 
May 20, 2008
11,530
102
Syracuse, NY, USA
On Tue, 24 Mar 2009 13:54:48 -0500, Stefano Piccardi <>
wrote:

|However, if in an alternate the first capture is sometimes called \1 and other times it's called \2 then we're out of luck. Consider:
|prefix A(\d+)|prefix B(\w+) should capture \1 as
|a number when it's prefixed by prefix A or as a word when it's prefixed by prefix B. It can't be rewritten as
|prefix A|prefix B(\d+|\w+)
|to lock the alternate capture into the same group of parentheses. The second regex is not equivalent to the first one.

What about (A\d+|B\w+)?

|IMO the cardinal number of the capture should be assigned as the match is being evaluated, not as it is being compiled from left to right.

Then you wouldn't know what was matched.
--
- Vince
 

rconn

Administrator
Staff member
May 14, 2008
12,404
152
Stefano Piccardi wrote:

> It seems that @REGEXSUB handles regex alternatives incorrectly.
>
> Code:
> ---------
> C:\> for %i in (a b c) echo %i in (a+)^|(b+): %@REGEXINDEX["(a+)|(b+)",x%i] (%@REGEXSUB[1,"(a+)|(b+)",x%i])
>
> a in (a+)|(b+): 1 (a)
> b in (a+)|(b+): 1 () <=== should match "b"
> c in (a+)|(b+): -1 ()
>
> C:\> ver
> TCC 9.02.152 Windows XP [Version 5.1.2600]
> ---------
> For comparison, @REGEXINDEX matches the second capture in the second line, (b+), while @REGEXSUB doesn't.

I'll pass it on to the Oniguruma developers (though in every "bug"
reported in regular expressions for the past couple of years Oniguruma
has been correct).

Rex Conn
JP Software
 
May 20, 2008
11,530
102
Syracuse, NY, USA
On Tue, 24 Mar 2009 21:52:54 -0500, rconn <> wrote:

|Stefano Piccardi wrote:
|
|
|---Quote---
|> It seems that @REGEXSUB handles regex alternatives incorrectly.
|>
|> Code:
|> ---------
|> C:\> for %i in (a b c) echo %i in (a+)^|(b+): %@REGEXINDEX["(a+)|(b+)",x%i] (%@REGEXSUB[1,"(a+)|(b+)",x%i])
|>
|> a in (a+)|(b+): 1 (a)
|> b in (a+)|(b+): 1 () <=== should match "b"
|> c in (a+)|(b+): -1 ()
|>
|> C:\> ver
|> TCC 9.02.152 Windows XP [Version 5.1.2600]
|> ---------
|> For comparison, @REGEXINDEX matches the second capture in the second line, (b+), while @REGEXSUB doesn't.
|---End Quote---
|I'll pass it on to the Oniguruma developers (though in every "bug"
|reported in regular expressions for the past couple of years Oniguruma
|has been correct).

I don't think it's Onig (or usage). If I do

while ( onig_search(regex, mstart, mend, mstart, mend, region, 0) >= 0 )

with regex pointing to the (unquoted) "(a+)|(b+)", and mstart/mend delimiting
the string "xxbxx", it finds a match (and knows where it is):

v:\> echo %@xreplace["(a+)|(b+)",**\2**,xxbxx]
xx**b**xx

In Stefano's faulty case, @REGEXINDEX is finding the pattern while @REGEXSUB is
not.
--
- Vince
 
May 20, 2008
11,530
102
Syracuse, NY, USA
On Tue, 24 Mar 2009 21:52:54 -0500, rconn <> wrote:

|---Quote---
|> It seems that @REGEXSUB handles regex alternatives incorrectly.
|>
|> Code:
|> ---------
|> C:\> for %i in (a b c) echo %i in (a+)^|(b+): %@REGEXINDEX["(a+)|(b+)",x%i] (%@REGEXSUB[1,"(a+)|(b+)",x%i])
|>
|> a in (a+)|(b+): 1 (a)
|> b in (a+)|(b+): 1 () <=== should match "b"
|> c in (a+)|(b+): -1 ()
|>
|> C:\> ver
|> TCC 9.02.152 Windows XP [Version 5.1.2600]
|> ---------
|> For comparison, @REGEXINDEX matches the second capture in the second line, (b+), while @REGEXSUB doesn't.
|---End Quote---
|I'll pass it on to the Oniguruma developers (though in every "bug"
|reported in regular expressions for the past couple of years Oniguruma
|has been correct).

Well, you know, it does (sort of) work, but not in a way that's very useful:

v:\> echo %@REGEXSUB[1,"(a+)|(b+)",cbbc]
ECHO is OFF

v:\> echo %@REGEXSUB[2,"(a+)|(b+)",cbbc]
bb

The help says, of @REGEXINDEX "returns the nth matching group in the string". So
the discrepancy is between two notions of "the nth matching group". Above,
there actually was a **first** match but it matched the second paranthesized
pattern; there certainly wasnt a 2nd match. IMO better behavior would be:

v:\> echo %@REGEXSUB[1,"(a+)|(b+)",cbbc]
bb [a first match]

v:\> echo %@REGEXSUB[2,"(a+)|(b+)",cbbc]
ECHO is OFF [no second match]
--
- Vince
 
Similar threads
Thread starter Title Forum Replies Date
gentzel Fixed TCMD 20 beta @REGEXSUB problem Support 6
nickles Fixed @regexsub[] broken Support 2
J goto compat issue makes all node/npm wrapper scripts fail to run... Support 2
cgunhouse TCToolBar /W Issue Support 2
G v28 Display Issue Support 7
Jay Sage Issue with CD_ENTER Alias Support 37
Jay Sage Issue with CD_LEAVE Alias Support 3
fpefpe How to? issue with % and evaluation Support 5
cgunhouse Standard User Account Issue Support 2
samintz WAD Display wrapping issue Support 5
M TCC color issue in ConEmu Support 4
samintz Timer issue Support 4
Alpengreis Documentation Minor issue in help file for view /E Support 1
samintz WAD ANSI issue Support 3
D TCSTART issue after reinstallation Support 14
A Documentation [Help file] OPTION "//" synopsis formatting issue Support 0
Chen Touboul Overtyping issue - i press the Insert key, help till press enter for new line in TCC Support 2
Y TCC and Cmder imcompatibility issue Support 7
M Another possibly strange remote registry issue Support 5
T WAD bpokestr issue Support 5
D High-DPI Scaling issue Support 1
S CMDebug 22 issue with del command Support 1
cgunhouse Foldermonitor Issue Support 7
samintz Build 38 Install issue Support 5
rps WAD Setlocal issue Support 4
Luiz Rodrigues Issue with Maven Support 1
Seven Update / install issue with 20.0.20.10.32 take command Support 7
Fross WebDav Drive Mapping Issue Support 19
D Upgrade issue Support 3
Fross V20b8 Everything Issue Support 5
T TCC display issue Support 1
cgunhouse _logfile Issue Support 5
T Fixed Querybox issue Support 1
cgunhouse Fixed Share Memory Issue Support 5
cgunhouse Foldermonitor with "/E" Option Issue Support 6
Fross TabComplete Argument Issue Support 1
fpefpe How to? Stange start up issue Support 1
C cosmetic issue with LIST's (I) Support 9
jbarnes1967 TC 18.00 x64 issue with lua io.popen() Support 2
D How to? Wildcards as sets - looping issue with FOR and REN Support 7
T WAD Free issue Support 6
R Fixed v17.0.54 Tab->COMSPEC Dialog Box Issue Support 2
R Minor Cosmetic Issue in Take Command Window Support 2
cgunhouse One line IFF ... Then ... Else ... EndIFF Statement issue in V17 Support 13
cgunhouse IFTP Path Issue in V17 - Fixed Support 7
S BDEBUGGER issue Support 9
cgunhouse @ISPROC Issue Support 15
cgunhouse TCMD.INI Issue Support 7
T Fixed multi-line alias issue Support 13
R WAD issue with filename completion with system and hidden files Support 8

Similar threads