1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

How to? Regex match when there shouldn't be (?)

Discussion in 'Support' started by mikea, Aug 23, 2013.

  1. mikea

    Joined:
    Dec 7, 2009
    Messages:
    210
    Likes Received:
    2
    In Perl:

    $amount = 1.5; # and now let's try:
    $amount =~ /^\.\d$/

    The regex specifies start of input, single '.', single digit, then end of input. That pattern clearly doesn't match "$amount" and such a comparison of course returns FALSE.

    Now in TCC:

    set amount=1.5
    if %amount =~ ^\.\d$ echo TRUE

    ... and TRUE is echoed. It's telling me I have a match even though the regular expression doesn't have the metacharacter for a digit at the start of input (preceding the '.').

    This is the first version of the command processor I've used that supports the regex comparison operator, and I'm sure I must be missing something simple and obvious. So...what am I missing? I would not have expected this to produce a match.

    TCC 15.01.52 x64 Windows 7 [Version 6.1.7601]
     
  2. Ugo

    Ugo

    Joined:
    Aug 22, 2013
    Messages:
    10
    Likes Received:
    0
    Let me see ...
    echo ^\.A$
    \.A$
    As far as I know, the echo command prints the whole line it receives unaltered,
    with all the quotes and other special characters, so the parser of the line should have messed around with the ^ character before passing the string to the command echo.

    I guess it is just a problem of quoting the regular expression to prevent this.
    Unfortunately I am not expert enough to give you an immediate solution ...
     
  3. mikea

    Joined:
    Dec 7, 2009
    Messages:
    210
    Likes Received:
    2
    I've tried working with the 'match' operator several ways: 1) nothing quoted; 2) string to be matched (typically in an environment variable) quoted; 3) regular expression only quoted; 4) both quoted.

    The quotes don't appear to work out so well.

    The "^" character does not appear to be treated as an "escape" character when it appears in this kind of command line -- as far as I can tell.

    No "echo" involved, though. It's more like this -- in a .BTM file:

    iff %somevariable =~ the-reg-expression-here then
    something happens here
    endiff
     
  4. Ugo

    Ugo

    Joined:
    Aug 22, 2013
    Messages:
    10
    Likes Received:
    0
    Try this:
    set amount=1.5
    if %amount =~ ^^\.\d$ echo TRUE

    The double hat works, I do not know why.
    The hat should have some special meaning for the TakeCommand parser.
     
  5. rconn

    rconn Administrator
    Staff Member

    Joined:
    May 14, 2008
    Messages:
    9,804
    Likes Received:
    82
    ^ is the TCC (and CMD) default escape character. You need to either escape the escape character (^^) or use single back quotes around the regex. (Or you could change your escape character to something else, but then third-party batch files would fail.)
     
  6. vefatica

    Joined:
    May 20, 2008
    Messages:
    7,883
    Likes Received:
    29
    Indeed! These get it right. After "set amount=1.5" ...
    Code:
    v:\> if %amount =~ ^^\d\.\d$ (echo yes) else (echo no)
    yes
    
    v:\> if %amount =~ ^^\.\d$ (echo yes) else (echo no)
    no
    If the literal string is quoted, the quotes are retained, so the regex must contain quotes.
    Code:
    v:\> if "%amount" =~ ^^\"\d\.\d\"$ (echo yes) else (echo no)
    yes
    
    v:\> if "%amount" =~ ^^\"\.\d\"$ (echo yes) else (echo no)
    no
     
  7. vefatica

    Joined:
    May 20, 2008
    Messages:
    7,883
    Likes Received:
    29
    Then I asked myself ... what if the target string needed to be quoted because it contained a space but also contained ONE (or any odd number of double-quotes). I figured there would be trouble. It's not too bad if you follow the rules. Below, I had to tell TCC that the string-literal quote is not a TCC-grouping quote (so TCC removes the '^' and it becomes a regex-literal quote). TCC sees the other two quotes as ordinary, so they protect the space. (I think I got that right!)
    Code:
    v:\> set zz=a "b
    
    v:\> echo %zz
    a "b
    
    v:\> if "%zz" =~ ^^\"a \^"b\"$ (echo yes) else (echo no)
    yes
    
    v:\> if "%zz" =~ ^^\"a \^"c\"$ (echo yes) else (echo no)
    no
     
  8. vefatica

    Joined:
    May 20, 2008
    Messages:
    7,883
    Likes Received:
    29
    Rex, I couldn't make your second suggestion work (back-quoting the regex). Are you sure it works? If so, how?
    Code:
    v:\> set zz=1.5
    
    v:\> if %zz =~ `^1\.5$` (echo yes) else (echo no)
    no
    It would seem that, in general, back-quotes are not removed.
    Code:
    v:\> if a =~ `a` (echo yes) else (echo no)
    no
    
    v:\> if a == `a` (echo yes) else (echo no)
    no
     
  9. Steve Fabian

    Joined:
    May 20, 2008
    Messages:
    3,523
    Likes Received:
    4
    Why do you nowadays deprecate your own invention, the symbolic representation of escape: %= (and likewise the symbolic representation of the command separator, %+)?
     
  10. vefatica

    Joined:
    May 20, 2008
    Messages:
    7,883
    Likes Received:
    29
    Deprecate? I don't think they were ever meant for general use. I don't recall if I ever needed one. They're twice as hard to type as the characters they represent. One might ask ... why do you use them so much?
     
  11. Steve Fabian

    Joined:
    May 20, 2008
    Messages:
    3,523
    Likes Received:
    4
    They were meant for general use. When they were introduced (with an early version of 4NT) they were promoted that this is the way anyone can use your programs without change, whether they used 4DOS defaults or 4NT defaults or anything else. Of course, it did not apply to the ParameterChar which never had a symbolic form, though I often wished it did, for like reasons.

    I use them regularly because depending on what I do, I actually change CommandSep and EscapeChar (and also ParameterChar) so I could use them in my programs as data without escaping them.

    Yes, it is an extra keystroke for the symbolic form, but it is unambiguous. Charles' SafeChars plugin is a great help, but the underlying problem is that due to its ancestry and CMD compatibility requirements, TCC does not have the strict dichotomy of code and datai almost all high level languages maintain. It's only commands, with part of the command sometimes used as data (but still interpreted first).
     
  12. rconn

    rconn Administrator
    Staff Member

    Joined:
    May 14, 2008
    Messages:
    9,804
    Likes Received:
    82
    They were only intended for short-term use, for people who were transitioning their COMMAND.COM / 4DOS batch files over to 4NT. That was 15-20 years ago, and there's no reason at all to use them now other than (1) to break CMD batch files, and (2) to make it harder to write TCC batch files.
     
  13. mikea

    Joined:
    Dec 7, 2009
    Messages:
    210
    Likes Received:
    2
    I began using those characters (%+ and %=) when I finally got it through my thick skull that it no longer made sense to keep using, for example, the ancient 4DOS separator when nobody else was using "^" for that. Ok, so I'd best start using "&" and "^" as TCC intends.

    Rex: I also couldn't get the regular expression routine to work when I'd back-quoted it. That is (in a .btm file):

    set amount=1.5
    if %amount =~ `^\d\.\d$` echo TRUE

    . . . does not echo TRUE, while this does:

    %amount =~ ^^\d\.$
     
  14. mikea

    Joined:
    Dec 7, 2009
    Messages:
    210
    Likes Received:
    2
    Yes, this works -- thank you. I just knew I'd been overlooking some simple thing . . .
     
  15. Steve Fabian

    Joined:
    May 20, 2008
    Messages:
    3,523
    Likes Received:
    4
    Rex, you may have intended them as short term aids, but there were others working for JP Software, Inc. (no names necessary). who promoted them, esp. for things published for others to use, along with disaliasing every command by the asterisk * prefix. I find ithe symbolic forms easier to notice in aliases and batch files than the single character forms, whatever they happen to be, exactly for the reason Vince does not like them - two-character sequences stand out more than singe characters, esp. characters often used for other purposes.

    I don't have any CMD batch files, nor do I expect ever to make the giant leap backward to the last century and create any.

    Why is it harder to write TCC batch with any specific representation of CommandSep and EscapeChar than any other? This is like saying the semicolon as used in C and C++ is a better statement terminator than the end-of-line as used in Fortran, Basic, and most scripting languages.
     
  16. mikea

    Joined:
    Dec 7, 2009
    Messages:
    210
    Likes Received:
    2
    > I don't have any CMD batch files, nor do I expect ever to make the giant leap backward to the last century and create any.

    This is a bit off-topic, I know, but: I didn't expect to, either. Then I ended up working for a place that absolutely refused to discuss anything other than cmd.exe (see my "TCC Evangelism" thread, from a few days ago) -- and I was stuck with .cmd files for a while. Oh, the torture.
     
  17. mikea

    Joined:
    Dec 7, 2009
    Messages:
    210
    Likes Received:
    2
    Ugo -- I can confirm that given this suggestion, all of the regular expression routines now work as expected. I had been preparing to do what I needed via a "hack" in which I'd run a Perl script at the required time during the execution of the .btm script -- but no hack is needed now. Thanks again.
     
  18. Steve Fabian

    Joined:
    May 20, 2008
    Messages:
    3,523
    Likes Received:
    4
    I cannot expect to work for any company at my age, so nobody can force me to abuse my time using CMD nor any of the command processors POSIX users refer to as "shells". But the proper evangelism would have been a challenge - a competition of how much time does it require to develop a specific useful program in TCC v CMD (without external programs.
     
  19. mikea

    Joined:
    Dec 7, 2009
    Messages:
    210
    Likes Received:
    2
    I don't want to cause topic drift in this thread, so if you don't mind I'll reply in that recent 'evangelism' thread...
     

Share This Page