Welcome!

By registering with us, you'll be able to discuss, share and private message with other members of our community.

SignUp Now!

Regular Expressions Do Not Work in DIR Command

Jun
564
4
Well, my starting to play with regular expressions has certainly led to discovering a lot of bugs. Here's another one.

The help entry called "Wildcards and Regular Expressions" shows how to use regular expressions for file name matching, such as with the DIR command. The discussion includes the following examples.

dir ::ca[td]
dir "::^\w{1,8}\.btm$"

The second example illustrates what to do if the regular expression contains any special characters, such as whitespace, redirection characters, or escape characters.

Unfortunately, it turns out that regular expressions actually do not work properly with the DIR command (and PDIR and maybe elsewhere). This post will illustrate the problems.

The examples use a DIR alias to keep the display simple.

TCC(30.00.18): C:\temp\sandbox>alias dirx
*dir /h /k /m /b

And I created a sandbox area with a few files.

TCC(30.00.18): C:\temp\sandbox>dirx
1
2
A
a b
B

Conversion to Lower Case​


It appears that with the DIR command, the entire command tail is converted to lower case. As a result, the regular expressions that use uppercase letters do not work. For example, \d is supposed to select digit characters and \D is supposed to select the opposite, non-digit characters. However, both act like the lowercase version.

TCC(30.00.18): C:\temp\sandbox>dirx ::\d
1
2

TCC(30.00.18): C:\temp\sandbox>dirx ::\D
1
2

This is not a regular-expression problem as illustrated by a similar command that uses @REGEX.

TCC(30.00.18): C:\temp\sandbox>for %file in (*) do if %@regex[\D,%file] EQ 1 echo Matching file: %file
Matching file: A
Matching file: a b
Matching file: B

If the command line is converted to lower case, we see the same erroneous result as with the DIR command.

TCC(30.00.18): C:\temp\sandbox>for %file in (*) do if %@regex[\d,%file] eq 1 echo Matching file: %file
Matching file: 1
Matching file: 2

Failure to Recognize POSIX Bracket Syntax​


Regular expressions support the following character classes, among others:

[:digit:]
[:upper:]
[:lower:]

And they work with @REGEX.

TCC(30.00.18): C:\temp\sandbox>for %file in (*) do if %@regex["[[:lower:]]",%file] EQ 1 echo Matching file: %file
Matching file: a b

TCC(30.00.18): C:\temp\sandbox>for %file in (*) do if %@regex["[[:upper:]]",%file] EQ 1 echo Matching file: %file
Matching file: A
Matching file: B

TCC(30.00.18): C:\temp\sandbox>for %file in (*) do if %@regex["[[:digit:]]",%file] EQ 1 echo Matching file: %file
Matching file: 1
Matching file: 2

But they do not work with DIR.

TCC(30.00.18): C:\temp\sandbox>dir "::[[:lower:]]"

Volume in drive C is Windows Serial number is a879:820d
TCC: (Sys) The system cannot find the file specified.
"C:\temp\sandbox\::[[:lower:]]"
0 bytes in 0 files and 0 dirs
 
This is more complicated than a simple "regexes don't work".

When I first implemented regular expressions for filename matches in TCC, I got hundreds of "bug" reports because users couldn't quite grasp the concept that while filename matches were normally case-insensitive in Windows, they were case-sensitive with regular expressions. After I changed the regex match to case-insensitive, I've only gotten one bug report about that in the last 10 years or so (yours).

I don't want to go back to defaulting to case-sensitive because I'm sure it will break thousands of existing batch files and aliases. I'm going have to write something to pre-parse the regular expression and decide what to do with the comparison.
 
I don't care about the case-sensitivity of filename matching. But, for example, the regexes "\d" and "\D" have different (in fact, opposite) meanings. DIR apparently changes "\D" into "\d".
 
As Vince wrote, I was not concerned about case-insensitive matching. It was the failure to process the uppercase regex terms that have meanings opposite to those of the lowercase versions.

I also pointed out that the POSIX character classes don't work at all.

The best thing, I think, would be to continue converting all file names to lower case before comparisons and noting that clearly in the help. The help should point out that character groups such as [a-zA-Z], [a-z], and [A-Z] will be equivalent and that [:upper:] will never match.

In the rare cases in which we do need case-sensitive results, we can use the FOR/REGEX construct that I used to illustrate the problems.
 
I am not a regexpert. But don't many regex syntaxes support a case-insensitive option? Perhaps TCC could automatically add such an option? At the start of the regexp, so the user can override with with a case-sensitive option if desired?

Or maybe I'm just talking out of my sphincter.
 
I had a similar thought, but more general: provide a switch for the DIR/PDIR commands to suppress the case conversion. Then one could run the following command to find all files that start with an uppercase 'T' (/nc is my new switch for "no case conversion"):

dir /nc T*

However, I have my doubts that it would be worth Rex's time to code that option. After all, we really hardly ever care about the case of file names.

As a side note to this, I had been annoyed that I could not use the RENAME command to do no more than change the case of a file name. Thus

rename testcase.txt TestCase.txt

did not work. However, I just discovered that this was my own fault! I aliased the rename commands to my own BTM script, and it was the script that caused the failure. I just updated it. However, I noticed a slight peculiarity of the RENAME command (the bolded part below).

TCC(30.00.18): C:\temp\sandbox>dirx & *rename "A B" "a b" & dirx
1
2
A
a b
B
C:\temp\sandbox\a b -> C:\temp\sandbox\a b
1 file renamed

1
2
A
a b
B

The RENAME command, when asked to rename "A B", matches the file "a b" and renames it to itself. Being compulsive, if I were writing the TCC code, I might try to fix this, though I'm not sure exactly what I would report, and it would be a waste of time, code, and execution speed.
 
Back
Top
[FOX] Ultimate Translator
Translate