Regex problem

Apr 13, 2010
307
7
61
The Hague
I would expect the following regex to return "long". It doesn't, and I don't understand why not. Of course I've also tried all combinations of quotes and parenthesis I could think of.

Code:
echo [EMAIL]%@regexsub[1,([/-]{1}([A-Za-z]{1})|--([A-Za-z]+)),--long:3][/EMAIL]

Could someone explain to me what I'm doing wrong?

DJ

P.S. I also have no idea where these EMAIL tags came from.
 
Jan 19, 2011
604
14
Norman, OK
Code:
echo %@regexsub[1,([/-]{1}([A-Za-z]{1})|--([A-Za-z]+)),--long:3]
                  (                                  )  first match group
                   ********************|*************   left or right
                                        ^^^^^^^^^^^^^   this matches
So it returns "--long" which is "two dashes followed by one or more upper or lower case letters".
 
Last edited:
May 20, 2008
11,400
99
Syracuse, NY, USA
I get "--long" and I'm not too surprised.

"--long" matches "--([A-Za-z]+)". So it matches the disjunction of the two expressions and so it matches the expression inside the outer parentheses which is expression 1.

The expression ([A-Za-z]+) is expression (3). If you ask for a match to expression 3, you'll get "long".

Code:
v:\> echo %@regexsub[3,([/-]{1}([A-Za-z]{1})|--([A-Za-z]+)),--long:3]
long
 
Jan 19, 2011
604
14
Norman, OK
Now that I think about it, it could have just as easily returned "-l" since that is "single slash or dash followed by a single upper or lower case letter".

Is there a reason it returned the second alternation instead of the first? Does it consider the longer string to be "more correct"?
 
May 20, 2008
11,400
99
Syracuse, NY, USA
Now that I think about it, it could have just as easily returned "-l" since that is "single slash or dash followed by a single upper or lower case letter".

Is there a reason it returned the second alternation instead of the first? Does it consider the longer string to be "more correct"?
That is odd. I'd expect the first of the two disjuncts, as here
Code:
v:\> echo %@regexsub[1,((ab)|(cde)),abcde]
ab
Undefined behavior?
 
Dec 7, 2009
238
2
Left Coast, USA
This slightly simplified version returns long:

echo %@regexsub[2,([/-][A-Za-z])|--([A-Za-z]+),--long:3]

It does not use the nested parens, which I'm not getting the purpose of. But then I must not be understanding the problem fully. I don't get why the above returns the desired string only with group *2*. I assume that in TCC '|' means 'or', and because of the 'or' I have what amounts to only one (group) in the regular expression. Why would TCC figure this is two groups? Or is this just a convention of %@regexsub[] itself?

Regarding the original, meaning %@regexsub[1,([/-]{1}([A-Za-z]{1})|--([A-Za-z]+)),--long:3]

I assume that in TCC the '{1}' means what it would mean in Perl, namely exactly one of the preceding character or expression. Since in the absence of some other operator:

[/-] by itself means exactly one of either '/' or '-'

and

[A-Za-z] by itself means exactly one alphabetic character, upper- or lower-case...

... why include '{1}' in those two situations? As I recall the best-practice advice for scripting, at least in Perl, is: if you can avoid backtracking, do avoid it.

(When I tried the original "echo" command and omitted the two occurrences of '{1}' there was no change in the output.)
 
May 20, 2008
11,400
99
Syracuse, NY, USA
This slightly simplified version returns long:

echo %@regexsub[2,([/-][A-Za-z])|--([A-Za-z]+),--long:3]

It does not use the nested parens, which I'm not getting the purpose of. But then I must not be understanding the problem fully. I don't get why the above returns the desired string only with group *2*. I assume that in TCC '|' means 'or', and because of the 'or' I have what amounts to only one (group) in the regular expression. Why would TCC figure this is two groups?

Simply, there are two groups; a matching pair of () is a group. You asked for the second one. However, if, with all else the same, I ask for the first group (which I think should be "-l"), I get nothing.
Code:
v:\> echo %@regexsub[2,([/-][A-Za-z])|--([A-Za-z]+),--long:3]
long

v:\> echo %@regexsub[1,([/-][A-Za-z])|--([A-Za-z]+),--long:3]
ECHO is OFF
 
Dec 7, 2009
238
2
Left Coast, USA
Yes, in a single regular expression, the first expression in parens is group 1. The second is group 2. Etc. (With some complications if they're nested.) However, since we're talking EITHER/OR here (the "|" character), at least in the more simplified example I provided, that's only one group. Not two.

In any case, for this variable function's purpose, clearly that doesn't matter.
 
May 20, 2008
11,400
99
Syracuse, NY, USA
Well, very simple examples show that something fishy is going on. I can't explain the difference here
Code:
v:\> echo %@regexsub[1,(1|123),1234]
1

v:\> echo %@regexsub[1,(2|123),1234]
123
or here
Code:
v:\> echo %@regexsub[1,(1)|(123),1234]
1

v:\> echo %@regexsub[1,(2)|(123),1234]
ECHO is OFF
 
Dec 7, 2009
238
2
Left Coast, USA
In the following example, it seems to return the match that it encounters first:

Code:
v:\> echo %@regexsub[1,(1|123),1234]
1

In the next example, I don't know why it doesn't do much the same thing, and return '2' since that's the first match it might encounter. But does this function assume that the expressions are at the start of the target string unless indicated otherwise?

Code:
v:\> echo %@regexsub[1,(2|123),1234]
123

Your next example:

Code:
v:\> echo %@regexsub[1,(1)|(123),1234]
1

If you reverse the order of the expressions:

Code:
v:\> echo %@regexsub[1,(123)|(1),1234]

... it returns '123'.

But:

Code:
v:\> echo %@regexsub[1,(234)|(1),1234]

In that case it displays "ECHO is OFF". Why not '234'? Are we in 'Only at the START of the target string' territory again?

As for this one:

Code:
v:\> echo %@regexsub[1,(2)|(123),1234]
ECHO is OFF

If you do this instead -- change the group you're asking it to match:

Code:
v:\> echo %@regexsub[2,(2)|(123),1234]

...the function returns '123'.

Two more for the road:

Code:
v:\> echo %@regexsub[1,(234)|(1),1234]
ECHO is OFF

v:\> echo %@regexsub[2,(234)|(1),1234]
1

I dunno. Is '|' the culprit here?
 

ben

Jan 3, 2012
44
6
UK
Everything is working as it should be. There are no culprits.

The recogniser uses these rules:

1. Capturing. Number each parenthesised subexpression by the position of its first open parenthesis symbol, counting from 1 from the left, ignoring any nesting.

2. Leftmost match. For the expression as a whole and for each subexpression, consider only leftmost matches.

3. Alternation. Of several alternatives that match, always choose either (a) the leftmost or (b) the longest. Perl 5's and TCC's recgnisers both choose (a).

%@regexsub[1,(1|123),1234]

Number the whole parenthesised expression 1.
Consider only the leftmost matches, that is, both of the alternatives.
Of the matching alternatives, choose the leftmost: 1.
Return what subexpression 1 matched: 1.

%@regexsub[1,(2|123),1234]

Number the whole parenthesised expression 1.
Consider only the leftmost match, that is, the second alternative.
Choose the only matching alternative: 123.
Return what subexpression 1 matched: 123.

%@regexsub[1,(1)|(123),1234]

Number the first alternative 1, the second alternative 2.
Consider only the leftmost matches, that is, both of the alternatives.
Of the matching alternatives, choose the leftmost: (1).
Return what subexpression 1 matched: 1.

%@regexsub[1,(123)|(1),1234]

Number the first alternative 1, the second alternative 2.
Consider only the leftmost matches, that is, both of the alternatives.
Of the matching alternatives, choose the leftmost: (123).
Return what subexpression 1 matched: 123.

%@regexsub[1,(234)|(1),1234]

Number the first alternative 1, the second alternative 2.
Consider only the leftmost match, that is, the second alternative.
Choose the only matching alternative: (1).
Return what subexpression 1 matched: [nothing].

%@regexsub[1,(2)|(123),1234]

Number the first alternative 1, the second alternative 2.
Consider only the leftmost match, that is, the second alternative.
Choose the only matching alternative: (123).
Return what subexpression 1 matched: [nothing].

%@regexsub[2,(2)|(123),1234]

Number the first alternative 1, the second alternative 2.
Consider only the leftmost match, that is, the second alternative.
Choose the only matching alternative: (123).
Return what subexpression 2 matched: 123.

%@regexsub[1,(234)|(1),1234]

Number the first alternative 1, the second alternative 2.
Consider only the leftmost match, that is, the second alternative.
Choose the only matching alternative: (1).
Return what subexpression 1 matched: [nothing].

%@regexsub[2,(234)|(1),1234]

Number the first alternative 1, the second alternative 2.
Consider only the leftmost match, that is, the second alternative.
Choose the only matching alternative: (1).
Return what subexpression 2 matched: 1.
 
Jan 19, 2011
604
14
Norman, OK
ben,

Outstanding explanation!

The leftmost match answers why "--long" was chosen over "-l".
 
Apr 13, 2010
307
7
61
The Hague
The way I understand this is that any parenthesised expression becomes a uniquely numbered group. Meaning with @REGEXSUB and a regex containing an alternation you can never catch either one or the other of the alternatives because @REGEXSUB's first parameter detemines which one is returned (if it matches).

Thanks, everyone.
 

ben

Jan 3, 2012
44
6
UK
I believe that is the case.

But if you want to extract the short or long option name without its prefix hyphen(s) or slash, try

echo %@regexsub[1,((?<=--)[A-Za-z]+|(?<=[/-])[A-Za-z]),%option]

Note the order of the alternation.
 
  • Like
Reactions: djspits
Apr 13, 2010
307
7
61
The Hague
I believe that is the case.

But if you want to extract the short or long option name without its prefix hyphen(s) or slash, try

echo %@regexsub[1,((?<=--)[A-Za-z]+|(?<=[/-])[A-Za-z]),%option]

Note the order of the alternation.

You've solved it using a positive lookbehind !
That is what I call an excellent answer.
I'm grateful. Thank you.

DJ
 
Similar threads
Thread starter Title Forum Replies Date
rps Regex problem: \xnn not recognized as a hex character Support 0
old coot Regex problem: \xnn not recognized as a hex character Support 12
F %@regex["^-","-a"] returns 0, "^-" =~ "-a" is false (no match) Support 4
JohnQSmith Regex renaming Support 2
vefatica TPIPE: unbalanced escaped quotes in a regex? Support 5
R Regex using ^ Support 2
T Regex engine doesn't recognize native DOS line endings Support 2
P Simple RegEx copy Support 9
samintz WAD Regex Analyzer Support 1
D How to? Use typed envars using regex. Support 3
P Renaming with a RegEx Support 1
R How to? use @everything perl regex Support 2
C v18 regex help please Support 1
C Font of RegEx Analyzer Support 0
mikea How to? Regex match when there shouldn't be (?) Support 18
JohnQSmith Fixed Copying with regex (several issues) Support 7
D Help needed to get a regex to work Support 3
thedave WAD Regex match on \h Support 5
Ville Regex & conditionals Support 9
samintz Regex Rename Support 2
vefatica @REGEX: behavior vs. documentation Support 2
vefatica @REGEX revisited Support 4
vefatica @REGEX question Support 6
vefatica Regex question Support 5
B Regex and Replace Support 6
Stefano Piccardi detecting BOM, FFIND multibyte regex Support 18
dcantor FFIND syntax -- is /E"regex" /X supported? Support 2
P Renaming files with regex. Support 6
B "Fun" with DO and Regex Support 12
P Need to use a regex in a "for" loop. Support 54
Dick Johnson Weird Color Problem Support 8
fishman@panix.com Problem with 27.15 Support 2
M Problem with VSDevCmd.bat in VS 16.7.3 Support 0
R Problem with @INT[ value] in V26 Support 9
M Selecting test "off by one" problem in Take command Support 4
Alpengreis UTF-8 problem in TCC related to Python Support 7
K_Meinhard Small problem in german IDE 26 Support 3
B Problem with color in nested shells Support 1
Joe Caverly Problem creating and switching to a DESKTOP Support 9
vefatica Another popup problem Support 10
Alpengreis ffind dialog (/W) problem Support 4
Alpengreis [TCMD v25.00.24] Small space problem with the DE translation in Prefs-GUI Support 1
Alpengreis [TCMD v25.00.24] Problem with copy and paste and the # char via mouse in TCC Support 6
A Problem with functions @int @decimal and identifying Powershell as a shell. Support 12
B IF command problem in tcexit.btm Support 9
fishman@panix.com Problem at Startup of TCC Support 3
P Problem with SFTP copies Support 7
P Problem with FTP copies Support 10
Jay Sage Problem with Context Menu Copy+Paste+Run Key Assignment Support 7
R Problem with %_do_loop in nested do loops Support 2

Similar threads