WAD SWITCH statement with regexes or wildcards as CASE expressions

May 20, 2017
22
0
Hi!

So I am writing, of all things, a batch file that creates a multipart MIME message body. Part of that is looking at a file name and determining its MIME type. The file name is given like so: pathandfile[:mime/type]. A few examples of valid filenames, some with MIME types, some without:
Code:
thatdir\userdata.yaml:text/yaml
thatdir\userdata.yaml
c:\cloudconfig\otherdir\configsamba.sh:text/x-shellscript
c:\cloudconfig\otherdir\configsumtin.sh
To extract the MIME type from the filename+type in w, I use
Code:
set mimetype=%@regexsub[2,^^(.*):([^^:/]*/[^^:/]*)$,%@unquotes[%[w]]]
If the MIME type is not specified or empty, the subroutine doing the parsing tries to infer the type, SWITCHing on the file extension:
Code:
switch "%@ext[%[getfilename_filename]]"
    case "yaml" .or. "yml"
        set getfilename_mimetype=text/cloud-config
    case "(b|ba|c|k|z)?sh"
        set getfilename_mimetype=text/x-shellscript
    case /"(b|ba|c|k|z)?sh"/
        set getfilename_mimetype=text/x-shellscript
    case "::("?)(b|ba|c|k|z)?sh\1"
        set getfilename_mimetype=text/x-shellscript
    default
        echo Cannot infer  MIME type for file %[getfilename_filename]
        set getfilename_rc=2
        set getfilename_mimetype=text/plain
endswitch
Please note the attempts at regular expressions. I realise that I could just write CASE "sh" .or. "bsh" .or. "csh" .or. …, but the SWITCH documentation for TCC (27.01.23) expressly states:
https://jpsoft.com/help/switch.htm said:
CASE statements can include wildcards and regular expressions.
and I confess to being curious.

If any of you know whether CASE accepts regexes and wildcards, and if so, how to bring that about, I would be very grateful. Thanks!
 
May 20, 2008
11,185
94
Syracuse, NY, USA
In a very simple test, regexes seem to work.

Code:
v:\> type switchtest.btm
setlocal
switch %1
    case ::a.*
        echo it begins with a
    case ::b.*
        echo it begins with b
    default
        echo it doesn't begin with a or b
endswitch

v:\> switchtest.btm abc
it begins with a

v:\> switchtest.btm bcd
it begins with b

v:\> switchtest.btm cde
it doesn't begin with a or b

I didn't test it but "::" may belong outside any quotes.
 
May 20, 2008
11,185
94
Syracuse, NY, USA
There's a little bug in that BTM. It should have said "it contains a", "it contains b", "it doesn't contain a or b".
 
May 20, 2008
11,185
94
Syracuse, NY, USA
Similarly,

Code:
v:\> type switchtest.btm
setlocal
switch %1
    case ::^^a.*
        echo it begins with a
    case ::^^b.*
        echo it begins with b
    default
        echo it doesn't begin with a or b
endswitch
v:\> switchtest.btm abc
it begins with a

v:\> switchtest.btm bcd
it begins with b

v:\> switchtest.btm cba
it doesn't begin with a or b
 
May 20, 2017
22
0
Thanks (once again)! If only I could figure out why quoted strings work for REN and so on. In my example, I quoted both the string under test and the regex; apparently that was enough to spoil the match. Yet, with REName:

ren "::^^123(.*)$" "::456\1"

will rename 123felix to 456felix; in particular, it will recognise the quoted regex and the quoted regex-replacement string.

For the nonce, I am calling this a bug.

Thank you for your help, which I am grateful for!
 
May 20, 2008
11,185
94
Syracuse, NY, USA
I suspect that REN, expecting file names, gets rid of the quotes. SWITCH doesn't know it's dealing with file names.
 
May 20, 2017
22
0
That is a plausible explanation. In the absence of documentation, I still would expect regex matching to behave the same everywhere. In particular, if a regex with special characters (<|> etc.) must be quoted, then the location of the "::" indicating the presence of the regex should be consistently inside the quotes or consistently outside or both placements should work.

The quotes are indeed considered part of the string; matching the switch expression "sh" against CASE ::^^"sh"$ works, but matching against CASE ::"^^sh$" ("^" doubled because it's my escape character) does not. Clearly, your explanation is correct (thanks!).

There are regex-matching functions (@REGEX, @REGEXINDEX, @REGEXSUB) that, according to the docs, require the regex to be "enclosed in double quotes if it contains any separator characters (space, comma, or tab)." I would be in favour of always unquoting regexes; internal double quotes can be handled via the usual escaping mechanisms. Unquoted regexes should be the exception, permitted for backwards compatibility only.
 
May 20, 2017
22
0
[…] I would be in favour of always unquoting regexes; […]
I apologise. I wanted to say:

I would be in favour of always unquoting regexes internally in TCC, before using them; that way, users could always quote them, including the double colon ::, and have one fewer thing to worry about.
 
May 20, 2017
22
0
SWITCH does not remove quotes in the CASE argument, so it's looking for a leading :: to specify a regex.
Thanks for the confirmation! What threw me was that the more prominent use cases of regexes, say in REN and other commands that deal with file names, do tolerate the whole regex to be quoted.

A note in the SWITCH help would be nice, perhaps with a recommendation for what to do when (a) a CASE regex must include special characters and (b) when it is not known whether the SWITCH expression is quoted ot not:

  • Change the regex to something like ::^("?)whatever\1$, with doubled '^' if it serves as the escape character; or
  • use a SWITCH vakue like "%@unquotes[%switchexpression]" to always force quotes, and then account for the quotes in the regex.

Thanks for all the work you put into TCC and its relatives; I am a 4DOS, then 4NT, user from way back, and still happy as a clam.

Felix.