Regular expressions?

#1
I suppose "::..\..." means (at least) two characters, then a '.' and then (at least) two characters. But that regular expression never works!
Code:
v:\zips> *dir
 
Volume in drive V is DATA          Serial number is c007:d3e4
Directory of  V:\zips\*
 
2012-11-26  23:28        <DIR>    .
2012-11-26  23:28        <DIR>    ..
2012-11-26  23:28        <DIR>    Save
2012-11-26  23:28        <DIR>    shralias_ascii_save_files
2012-11-26  23:28        <DIR>    X64
2012-11-25  15:40          38,933  4console.zip
2012-11-26  15:43          86,596  4threads.zip
2012-11-26  15:30          61,867  4utils.zip
2012-11-06  05:41          56,097  sysutils.zip
          243,493 bytes in 4 files and 5 dirs    253,952 bytes allocated
    6,785,912,832 bytes free
 
v:\zips> *dir "::..\..."
 
Volume in drive V is DATA          Serial number is c007:d3e4
TCC: (Sys) The system cannot find the file specified.
"V:\zips\::..\..\.."
                0 bytes in 0 files and 0 dirs
The error message above gives a clue; in it, my regular expression was changed!

Oddly, "::.\.." (char, dot, char) and "...\...." (3 chars, dot, 3 chars) work.
Code:
v:\zips> *dir "::.\.."
 
Volume in drive V is DATA          Serial number is c007:d3e4
Directory of  V:\zips\::.\..
 
2012-11-25  15:40          38,933  4console.zip
2012-11-26  15:43          86,596  4threads.zip
2012-11-26  15:30          61,867  4utils.zip
2012-11-06  05:41          56,097  sysutils.zip
          243,493 bytes in 4 files and 0 dirs    253,952 bytes allocated
    6,785,912,832 bytes free
 
v:\zips> *dir "::...\...."
 
Volume in drive V is DATA          Serial number is c007:d3e4
Directory of  V:\zips\::...\....
 
2012-11-25  15:40          38,933  4console.zip
2012-11-26  15:43          86,596  4threads.zip
2012-11-26  15:30          61,867  4utils.zip
2012-11-06  05:41          56,097  sysutils.zip
          243,493 bytes in 4 files and 0 dirs    253,952 bytes allocated
    6,785,912,832 bytes free
 

rconn

Administrator
Staff member
May 14, 2008
10,627
97
#2
I suppose "::..\..." means (at least) two characters, then a '.' and then (at least) two characters. But that regular expression never works!
WAD - embedded multiple (more than two consecutive) .'s in a filename are expanded into "extended parent directory names" (and have been for the last 20 years). See the help for details.
 
#3
But a regex is not a filename... The very useful syntax of extending multiple consecutive periods into extended parent directory names ought not to apply INSIDE a regular expression, where alternate syntax is used... though undoubtedly it would be a tough job for the parser.
 

rconn

Administrator
Staff member
May 14, 2008
10,627
97
#4
But a regex is not a filename... The very useful syntax of extending multiple consecutive periods into extended parent directory names ought not to apply INSIDE a regular expression, where alternate syntax is used... though undoubtedly it would be a tough job for the parser.
In this case, a regex definitely *is* (at least part of) the filename -- and the extended parent directory name expansion is done before the wildcard and/or regular expression parsing.

Changing that would require rewriting much of the command line parser (several months work at least), and would definitely result in breaking a few million existing batch files and aliases.
 
#5
In this case, a regex definitely *is* (at least part of) the filename -- and the extended parent directory name expansion is done before the wildcard and/or regular expression parsing.
Yes, I guessed that.
Changing that would require rewriting much of the command line parser (several months work at least), and would definitely result in breaking a few million existing batch files and aliases.
The effort required is not surprising. I would quibble about the number of programs the change would effect, but it is irrelevant - the benefits are certainly not worth the effort. However, there ought to be a way to specify that the user wants to select all files with a name of at least 2 characters and an extension of at least 1 character. Well, there is!

The file match string "*[?][?].[?]*" matches files (or in the appropriate context also directories) with a name of at least 2 characters and an extension of at least 1 character. It does not require using a regex, so it ought to operate faster, too.
 
#6
In this case, a regex definitely *is* (at least part of) the filename -- and the extended parent directory name expansion is done before the wildcard and/or regular expression parsing.

Changing that would require rewriting much of the command line parser (several months work at least), and would definitely result in breaking a few million existing batch files and aliases.
How dare you look inside my regular expression! Backslashes and dots are pretty common in regular expressions. IMHO, anything following "::" should be treated as a regular expression. If the user intends a path specification, let him, for example, use "..\subdir\::regex".
 

rconn

Administrator
Staff member
May 14, 2008
10,627
97
#7
How dare you look inside my regular expression! Backslashes and dots are pretty common in regular expressions. IMHO, anything following "::" should be treated as a regular expression. If the user intends a path specification, let him, for example, use "..\subdir\::regex".
Sure. Just requires a few months for a parser rewrite and introducing a few zillion new incompatibilities. :banghead:

I really don't think that three or more consecutive dots (backslashes are irrelevant) are especially common in RE's. You somehow managed to avoid using them for the past few years ...
 
#8
I really don't think that three or more consecutive dots (backslashes are irrelevant) are especially common in RE's.
Three or more doesn't seem to be a problem. One or two is a problem. Please explain what's happening below, where three or more dots (before the "\.") gives the desired behavior (and one or two doesn't)
Code:
v:\test> *dir

 Volume in drive V is DATA           Serial number is c007:d3e4
 Directory of  V:\test\*

2012-12-26  22:53         <DIR>    .
2012-12-26  22:53         <DIR>    ..
2012-12-26  22:53               0  x.txt
2012-12-26  22:53               0  xx.txt
2012-12-26  22:53               0  xxx.txt
2012-12-26  22:53               0  xxxx.txt
2012-12-26  22:53               0  xxxxx.txt
2012-12-26  22:53               0  xxxxxx.txt
2012-12-26  22:53               0  xxxxxxx.txt
2012-12-26  22:53               0  xxxxxxxx.txt
2012-12-26  22:53               0  xxxxxxxxx.txt
2012-12-26  22:53               0  xxxxxxxxxx.txt
                 0 bytes in 10 files and 2 dirs
     6,785,961,984 bytes free

v:\test> *dir "::.\...."

 Volume in drive V is DATA           Serial number is c007:d3e4
TCC: (Sys) The system cannot find the file specified.
 "V:\test\::.\..\..\.."
                 0 bytes in 0 files and 0 dirs
     6,785,961,984 bytes free

v:\test> *dir "::..\...."

 Volume in drive V is DATA           Serial number is c007:d3e4
TCC: (Sys) The system cannot find the file specified.
 "V:\test\::..\..\..\.."
                 0 bytes in 0 files and 0 dirs
     6,785,961,984 bytes free

v:\test> *dir "::...\...."

 Volume in drive V is DATA           Serial number is c007:d3e4
 Directory of  V:\test\::...\....

2012-12-26  22:53               0  xxx.txt
2012-12-26  22:53               0  xxxx.txt
2012-12-26  22:53               0  xxxxx.txt
2012-12-26  22:53               0  xxxxxx.txt
2012-12-26  22:53               0  xxxxxxx.txt
2012-12-26  22:53               0  xxxxxxxx.txt
2012-12-26  22:53               0  xxxxxxxxx.txt
2012-12-26  22:53               0  xxxxxxxxxx.txt
                 0 bytes in 8 files and 0 dirs
     6,785,961,984 bytes free

v:\test> *dir "::....\...."

 Volume in drive V is DATA           Serial number is c007:d3e4
 Directory of  V:\test\::....\....

2012-12-26  22:53               0  xxxx.txt
2012-12-26  22:53               0  xxxxx.txt
2012-12-26  22:53               0  xxxxxx.txt
2012-12-26  22:53               0  xxxxxxx.txt
2012-12-26  22:53               0  xxxxxxxx.txt
2012-12-26  22:53               0  xxxxxxxxx.txt
2012-12-26  22:53               0  xxxxxxxxxx.txt
                 0 bytes in 7 files and 0 dirs
     6,785,961,984 bytes free

v:\test> *dir "::.....\...."

 Volume in drive V is DATA           Serial number is c007:d3e4
 Directory of  V:\test\::.....\....

2012-12-26  22:53               0  xxxxx.txt
2012-12-26  22:53               0  xxxxxx.txt
2012-12-26  22:53               0  xxxxxxx.txt
2012-12-26  22:53               0  xxxxxxxx.txt
2012-12-26  22:53               0  xxxxxxxxx.txt
2012-12-26  22:53               0  xxxxxxxxxx.txt
                 0 bytes in 6 files and 0 dirs
     6,785,961,984 bytes free

v:\test> *dir "::......\...."

 Volume in drive V is DATA           Serial number is c007:d3e4
 Directory of  V:\test\::......\....

2012-12-26  22:53               0  xxxxxx.txt
2012-12-26  22:53               0  xxxxxxx.txt
2012-12-26  22:53               0  xxxxxxxx.txt
2012-12-26  22:53               0  xxxxxxxxx.txt
2012-12-26  22:53               0  xxxxxxxxxx.txt
                 0 bytes in 5 files and 0 dirs
     6,785,961,984 bytes free
 
Jan 19, 2011
581
10
Norman, OK
#10
Then it's designed incorrectly. You can't say "this is a regex and it follows the rules is this Oniguruma document" and then turn around and say "well, it doesn't really follow the rules because it's expanding parent paths even though they are in a specifically designated regex."

You somehow managed to avoid using them for the past few years ...
Really? Blaming the user again?
 

rconn

Administrator
Staff member
May 14, 2008
10,627
97
#11
Then it's designed incorrectly. You can't say "this is a regex and it follows the rules is this Oniguruma document" and then turn around and say "well, it doesn't really follow the rules because it's expanding parent paths even though they are in a specifically designated regex."
The extended parent directory names preceded the regular expression support by about 15 years. There's been a grand total of one reported problem with the parent directory + regular expression syntax thus far; reversing the parsing order would cause thousands of problems. (And extended parent names are used far more often than RE's.)

Really? Blaming the user again?
Hardly. The original problem was invented & not particularly realistic, and there are existing workarounds that are much more practical than spending a few months rewriting the parser (and breaking everybody *else's* syntax).
 
Jan 19, 2011
581
10
Norman, OK
#12
The extended parent directory names preceded the regular expression support by about 15 years. There's been a grand total of one reported problem with the parent directory + regular expression syntax thus far; reversing the parsing order would cause thousands of problems. (And extended parent names are used far more often than RE's.)
I totally understand this. I love extended parent directory names. I find myself in CMD typing something like "cd ..." and then cursing and looking for my jump stick and my portable TC.

Hardly. The original problem was invented & not particularly realistic, and there are existing workarounds...
It's a defined regex block, there shouldn't have to be a workaround.

... that are much more practical than spending a few months rewriting the parser (and breaking everybody *else's* syntax).
I would like to offer a request. PLEASE!!! Think about rewriting the parser so that when it sees the "::" syntax to: 1) stop whatever its doing, 2) do a lookahead to find where the regex ends, 3) interpret the regex, 4) return the result to the parser so it can continue.

Basically, "::this_is_a_regex" so the regular command line parser should keep its grubby mitts off. The parser should look at it and say, "I'm not allowed to touch anything inside of that construct." The regex should have an entirely separate handler. Perhaps it could process the text or list of files or whatever else and then return an array of results back to main parser for continued processing.
 

rconn

Administrator
Staff member
May 14, 2008
10,627
97
#13
I would like to offer a request. PLEASE!!! Think about rewriting the parser so that when it sees the "::" syntax to: 1) stop whatever its doing, 2) do a lookahead to find where the regex ends, 3) interpret the regex, 4) return the result to the parser so it can continue.
I can do that, but it's going to require a major parser rewrite (> 30K lines of code, and several weeks at a minimum). Major parser changes like this tend to have unfortunate side effects of breaking lots of existing code (and CMD compatibility), so I try to avoid them unless there's a compelling reason. IMHO this issue hasn't yet risen to the level that would warrant the effort & pain involved.