Welcome!

By registering with us, you'll be able to discuss, share and private message with other members of our community.

SignUp Now!

How to? Regex match when there shouldn't be (?)

Dec
238
2
In Perl:

$amount = 1.5; # and now let's try:
$amount =~ /^\.\d$/

The regex specifies start of input, single '.', single digit, then end of input. That pattern clearly doesn't match "$amount" and such a comparison of course returns FALSE.

Now in TCC:

set amount=1.5
if %amount =~ ^\.\d$ echo TRUE

... and TRUE is echoed. It's telling me I have a match even though the regular expression doesn't have the metacharacter for a digit at the start of input (preceding the '.').

This is the first version of the command processor I've used that supports the regex comparison operator, and I'm sure I must be missing something simple and obvious. So...what am I missing? I would not have expected this to produce a match.

TCC 15.01.52 x64 Windows 7 [Version 6.1.7601]
 
Let me see ...
echo ^\.A$
\.A$
As far as I know, the echo command prints the whole line it receives unaltered,
with all the quotes and other special characters, so the parser of the line should have messed around with the ^ character before passing the string to the command echo.

I guess it is just a problem of quoting the regular expression to prevent this.
Unfortunately I am not expert enough to give you an immediate solution ...
 
I've tried working with the 'match' operator several ways: 1) nothing quoted; 2) string to be matched (typically in an environment variable) quoted; 3) regular expression only quoted; 4) both quoted.

The quotes don't appear to work out so well.

The "^" character does not appear to be treated as an "escape" character when it appears in this kind of command line -- as far as I can tell.

No "echo" involved, though. It's more like this -- in a .BTM file:

iff %somevariable =~ the-reg-expression-here then
something happens here
endiff
 
Try this:
set amount=1.5
if %amount =~ ^^\.\d$ echo TRUE

The double hat works, I do not know why.
The hat should have some special meaning for the TakeCommand parser.
 
This is the first version of the command processor I've used that supports the regex comparison operator, and I'm sure I must be missing something simple and obvious. So...what am I missing? I would not have expected this to produce a match.

^ is the TCC (and CMD) default escape character. You need to either escape the escape character (^^) or use single back quotes around the regex. (Or you could change your escape character to something else, but then third-party batch files would fail.)
 
^ is the TCC (and CMD) default escape character. You need to either escape the escape character (^^) or use single back quotes around the regex. (Or you could change your escape character to something else, but then third-party batch files would fail.)
Indeed! These get it right. After "set amount=1.5" ...
Code:
v:\> if %amount =~ ^^\d\.\d$ (echo yes) else (echo no)
yes

v:\> if %amount =~ ^^\.\d$ (echo yes) else (echo no)
no

If the literal string is quoted, the quotes are retained, so the regex must contain quotes.
Code:
v:\> if "%amount" =~ ^^\"\d\.\d\"$ (echo yes) else (echo no)
yes

v:\> if "%amount" =~ ^^\"\.\d\"$ (echo yes) else (echo no)
no
 
Then I asked myself ... what if the target string needed to be quoted because it contained a space but also contained ONE (or any odd number of double-quotes). I figured there would be trouble. It's not too bad if you follow the rules. Below, I had to tell TCC that the string-literal quote is not a TCC-grouping quote (so TCC removes the '^' and it becomes a regex-literal quote). TCC sees the other two quotes as ordinary, so they protect the space. (I think I got that right!)
Code:
v:\> set zz=a "b

v:\> echo %zz
a "b

v:\> if "%zz" =~ ^^\"a \^"b\"$ (echo yes) else (echo no)
yes

v:\> if "%zz" =~ ^^\"a \^"c\"$ (echo yes) else (echo no)
no
 
^ is the TCC (and CMD) default escape character. You need to either escape the escape character (^^) or use single back quotes around the regex. (Or you could change your escape character to something else, but then third-party batch files would fail.)
Rex, I couldn't make your second suggestion work (back-quoting the regex). Are you sure it works? If so, how?
Code:
v:\> set zz=1.5

v:\> if %zz =~ `^1\.5$` (echo yes) else (echo no)
no
It would seem that, in general, back-quotes are not removed.
Code:
v:\> if a =~ `a` (echo yes) else (echo no)
no

v:\> if a == `a` (echo yes) else (echo no)
no
 
^ is the TCC (and CMD) default escape character. You need to either escape the escape character (^^) or use single back quotes around the regex. (Or you could change your escape character to something else, but then third-party batch files would fail.)
Why do you nowadays deprecate your own invention, the symbolic representation of escape: %= (and likewise the symbolic representation of the command separator, %+)?
 
Why do you nowadays deprecate your own invention, the symbolic representation of escape: %= (and likewise the symbolic representation of the command separator, %+)?
Deprecate? I don't think they were ever meant for general use. I don't recall if I ever needed one. They're twice as hard to type as the characters they represent. One might ask ... why do you use them so much?
 
They were meant for general use. When they were introduced (with an early version of 4NT) they were promoted that this is the way anyone can use your programs without change, whether they used 4DOS defaults or 4NT defaults or anything else. Of course, it did not apply to the ParameterChar which never had a symbolic form, though I often wished it did, for like reasons.

I use them regularly because depending on what I do, I actually change CommandSep and EscapeChar (and also ParameterChar) so I could use them in my programs as data without escaping them.

Yes, it is an extra keystroke for the symbolic form, but it is unambiguous. Charles' SafeChars plugin is a great help, but the underlying problem is that due to its ancestry and CMD compatibility requirements, TCC does not have the strict dichotomy of code and datai almost all high level languages maintain. It's only commands, with part of the command sometimes used as data (but still interpreted first).
 
They were meant for general use. When they were introduced (with an early version of 4NT) they were promoted that this is the way anyone can use your programs without change, whether they used 4DOS defaults or 4NT defaults or anything else. Of course, it did not apply to the ParameterChar which never had a symbolic form, though I often wished it did, for like reasons.

They were only intended for short-term use, for people who were transitioning their COMMAND.COM / 4DOS batch files over to 4NT. That was 15-20 years ago, and there's no reason at all to use them now other than (1) to break CMD batch files, and (2) to make it harder to write TCC batch files.
 
I began using those characters (%+ and %=) when I finally got it through my thick skull that it no longer made sense to keep using, for example, the ancient 4DOS separator when nobody else was using "^" for that. Ok, so I'd best start using "&" and "^" as TCC intends.

Rex: I also couldn't get the regular expression routine to work when I'd back-quoted it. That is (in a .btm file):

set amount=1.5
if %amount =~ `^\d\.\d$` echo TRUE

. . . does not echo TRUE, while this does:

%amount =~ ^^\d\.$
 
They were only intended for short-term use, for people who were transitioning their COMMAND.COM / 4DOS batch files over to 4NT. That was 15-20 years ago, and there's no reason at all to use them now ...
Rex, you may have intended them as short term aids, but there were others working for JP Software, Inc. (no names necessary). who promoted them, esp. for things published for others to use, along with disaliasing every command by the asterisk * prefix. I find ithe symbolic forms easier to notice in aliases and batch files than the single character forms, whatever they happen to be, exactly for the reason Vince does not like them - two-character sequences stand out more than singe characters, esp. characters often used for other purposes.

...(1) to break CMD batch files
I don't have any CMD batch files, nor do I expect ever to make the giant leap backward to the last century and create any.

(2) to make it harder to write TCC batch files.
Why is it harder to write TCC batch with any specific representation of CommandSep and EscapeChar than any other? This is like saying the semicolon as used in C and C++ is a better statement terminator than the end-of-line as used in Fortran, Basic, and most scripting languages.
 
> I don't have any CMD batch files, nor do I expect ever to make the giant leap backward to the last century and create any.

This is a bit off-topic, I know, but: I didn't expect to, either. Then I ended up working for a place that absolutely refused to discuss anything other than cmd.exe (see my "TCC Evangelism" thread, from a few days ago) -- and I was stuck with .cmd files for a while. Oh, the torture.
 
Try this:
set amount=1.5
if %amount =~ ^^\.\d$ echo TRUE

The double hat works, I do not know why.
The hat should have some special meaning for the TakeCommand parser.

Ugo -- I can confirm that given this suggestion, all of the regular expression routines now work as expected. I had been preparing to do what I needed via a "hack" in which I'd run a Perl script at the required time during the execution of the .btm script -- but no hack is needed now. Thanks again.
 
> I don't have any CMD batch files, nor do I expect ever to make the giant leap backward to the last century and create any.

This is a bit off-topic, I know, but: I didn't expect to, either. Then I ended up working for a place that absolutely refused to discuss anything other than cmd.exe (see my "TCC Evangelism" thread, from a few days ago) -- and I was stuck with .cmd files for a while. Oh, the torture.
I cannot expect to work for any company at my age, so nobody can force me to abuse my time using CMD nor any of the command processors POSIX users refer to as "shells". But the proper evangelism would have been a challenge - a competition of how much time does it require to develop a specific useful program in TCC v CMD (without external programs.
 
I don't want to cause topic drift in this thread, so if you don't mind I'll reply in that recent 'evangelism' thread...
 

Similar threads

Back
Top