- Jun
- 760
- 16
Coming from the world of CP/M and DOS rather than Unix, I almost never used regular expressions; the extended wildcards in TCC generally did everything that I needed. Lately, however, after seeing how others use regular expressions, I've started to experiment with them.
And that has led to my noticing something that I find very peculiar.
The help text goes to great lengths to describe two ways to pass strings that have special characters.
A string can be enclosed in double-quote characters. Most special characters are treated literally, but environment variables are expanded. Thus, if we have defined the variable char as follows
then we get the following:
The variable is expanded, but the "redirection" characters are treated literally. And the quotation characters are retained.
A string can also be enclosed in back quotes. In that case, the entire string is to be taken literally, variables are not expanded, and the quotation characters are not passed along. Thus
The one character that defies the rules is the caret character. When it appears in a string enclosed in double-quote characters, it is treated literally. Its special behavior is lost.
Ordinarily, ^s would be converted to the space character, but in double-quoted strings it is not.
Counter-intuitively, inside back quotes, the caret character, unlike all other characters, maintains its special meaning. Thus
The string ^s is converted to a space character.
The big problem with this is that the caret character is very important in regular expressions. It can indicated the beginning of the string or serve to negate another expression. I expected that I could pass that character in a regular expression by enclosing the entire expression in back quotes. But that fails to work! I would have expected the following command to find all files whose name starts with the letter A.
However, the ^a becomes just a, and all files that contain the letter A anywhere in their name are displayed.
Using double-quotes works.
Fortunately, the regular-expression interpreter does not mind the double-quote characters that are not removed from the string.
This certainly looks like a bug to me. Maybe Rex will provide a reason for this behavior, but why do we need to retain the special behavior of the caret character in a back-quoted string? We don't need it. We can pass special characters without using a caret.
But we can't pass a caret without doing something extra!
And that has led to my noticing something that I find very peculiar.
The help text goes to great lengths to describe two ways to pass strings that have special characters.
Double Quotes
A string can be enclosed in double-quote characters. Most special characters are treated literally, but environment variables are expanded. Thus, if we have defined the variable char as follows
set char=A
then we get the following:
C:\>echo "<%char>"
"<A>"
The variable is expanded, but the "redirection" characters are treated literally. And the quotation characters are retained.
Back Quotes
A string can also be enclosed in back quotes. In that case, the entire string is to be taken literally, variables are not expanded, and the quotation characters are not passed along. Thus
C:\>echo `<%char>`
<%char>
The Peculiar Case of the Caret Character
The one character that defies the rules is the caret character. When it appears in a string enclosed in double-quote characters, it is treated literally. Its special behavior is lost.
C:\>echo "a^sb"
"a^sb"
Ordinarily, ^s would be converted to the space character, but in double-quoted strings it is not.
Counter-intuitively, inside back quotes, the caret character, unlike all other characters, maintains its special meaning. Thus
C:\>echo `a^sb`
a b
The string ^s is converted to a space character.
The Problem
The big problem with this is that the caret character is very important in regular expressions. It can indicated the beginning of the string or serve to negate another expression. I expected that I could pass that character in a regular expression by enclosing the entire expression in back quotes. But that fails to work! I would have expected the following command to find all files whose name starts with the letter A.
dir ::`^a`
However, the ^a becomes just a, and all files that contain the letter A anywhere in their name are displayed.
Using double-quotes works.
dir ::"^a"
Fortunately, the regular-expression interpreter does not mind the double-quote characters that are not removed from the string.
A Bug?
This certainly looks like a bug to me. Maybe Rex will provide a reason for this behavior, but why do we need to retain the special behavior of the caret character in a back-quoted string? We don't need it. We can pass special characters without using a caret.
C:\>echo `a b` & echo `a^sb`
a b
a b
But we can't pass a caret without doing something extra!
C:\>echo `a^b` & echo `a^^b`
a
a^b