Welcome!

By registering with us, you'll be able to discuss, share and private message with other members of our community.

SignUp Now!

WAD function return values with parentheses prevent evaluation of additional functions

Feb
240
3
When a function returns a string starting with an open parenthesis, additional functions on the line are not evaluated properly.
For instance, consider the following two examples, which use filenames which include an open parenthesis sign (this sign is a legal character in filenames under Windows):
[R:\temp]echo %@NAME["(aaa.pdf"].%@NAME["bbb.pdf"]
(aaa.%@NAME["bbb.pdf"]
Similarly:
[R:\temp]echo %@EXT["aaa.(pdf"].%@EXT["bbb.pdf"]
(pdf.%@EXT["bbb.pdf"]

Is this a SAFECHARS issue, or might it be a bug in the function processing?
 
A '(' is a legal filename character -- but it's also a really dumb filename character. (Like the similarly legal but even dumber %.)

The problem is that the ( is also the character that begins a command group, and variable expansion is not performed inside command groups.
 
Well, what can I do, I've got thousands of files containing parenthesees in the filename (I didn't decide on the filenames).
Is there any way around this? Can I escape the parenthesis somehow? Can I temporarily disable command grouping for the command?

A '(' is a legal filename character -- but it's also a really dumb filename character. (Like the similarly legal but even dumber %.)

The problem is that the ( is also the character that begins a command group, and variable expansion is not performed inside command groups.
 
Well, what can I do, I've got thousands of files containing parenthesees in the filename (I didn't decide on the filenames).
Is there any way around this? Can I escape the parenthesis somehow? Can I temporarily disable command grouping for the command?

Try SETDOS /X-4.
 
This is not a common issue -- it requires that you both have a filename that begins with a (, and that also contains no ), and that you're trying to do multiple variable substitution where the first filename is the one with the leading (. There's not much I can do about it other than to remove support for command groups, and I suspect that would generate more than one complaint in 10 years (yours is the first).

You can also escape the leading ( with two ^^'s.
 
Rex, the fact that no one has complained about this in 10 years is actually quite alarming, I think. It occurs to me that users may have used time-tested scripts over large batches of files without even realizing that certain files (those that begin with a parenthesis) were being damaged by their trusty script.
That's exactly what (almost) happened to me. I wrote a one-line BTM which takes a filename as a parameter and creates a shortcut to that file, adding .lnk at the end. (In order to do this I have to use two parameter substitutions on the same line, of course: one for the filename and one for the shortcut name).
I tested this BTM on a bunch of files, and then, satisfied by the results, I ran it across a very large list of some 6000 files. Only by chance did I happen to notice that a few of the resulting shortcut names were mangled. I easily could have missed these, since the rest of the files were fine, and I could have gone on using this script for years without realizing that here and there filenames were being mangled.
Over the past two months I've become more and more enthralled by your wonderful product, but now, for the very first time, this phenomenon has shaken my confidence in TCC, because you've indicated that the product is not intended to work with all legal filenames, but rather there are some characters that the product simply is not tested for and will not work with.
As dumb as it may be to use a parenthesis or a percent sign in a filename, they are legal characters, and on my system I've got 15 years worth of legacy files, over 2 million of them, and they make use of the full range of legal characters. I find it very hard to believe that my system is unique in this regard.
I've considered that every time I run a BTM file perhaps I should run a script that first inspects the files to see if the first character is a parenthesis, and that prefixes a character as necessary. But that makes the whole BTM infrastructure much more cumbersome.
In terms of possible solutions:
(a) Is it really necessary to support command groups *inside quotations*? I quote all my filenames on both sides (because they often contain spaces). Would it be possible to add an option to disable command groups inside quotation marks?
(b) Alternatively, would it be possible to add an option to disable command groups altogether? Of course nobody wants to dismiss command groups completely, but if it could be disabled in a OPTION command, then my BTM could first disable the command groups, then create the shortcut, and then reenable to command groups.
With much respect for your fine product,
Avi

This is not a common issue -- it requires that you both have a filename that begins with a (, and that also contains no ), and that you're trying to do multiple variable substitution where the first filename is the one with the leading (. There's not much I can do about it other than to remove support for command groups, and I suspect that would generate more than one complaint in 10 years (yours is the first).

You can also escape the leading ( with two ^^'s.
 
OK, SETDOS /X-4 does solve the problem here. So I can have my BTM files each do a SETDOS /X-4 on the first line.
But what about filenames that contain a percent sign?
Here again, although you may call them "badly formed", they are legal. In fact, in a quick search, I found that I have over 600 files on my computer which contain percent signs within them. Among the items: directories and files belonging to the "Alcohol 120%" product; downloaded files in which all non-standard chars have been converted to URL Encoding (e.g. every space is represented by %20); logfiles from the Windows event viewer, which contain percent signs as a standard part of the filename; and many more. I think this is sufficient to demonstrate that the odds are high that on any given system there are at least a few filenames with percent signs (I would venture that there are many such files on your own system as well!)
So, what can we do about these filenames? Here SETDOS /X-4 does not do the trick.
My .BTM file contains the following line:
shortcut "%1$" "" "" "" "%1$.lnk" 1
But when I run it on a filename which contains a percent sign, all characters after the percent sign are left out (until the period). Worse, if the chars after the percent sign happen to match an environment variable, the variable is expanded fully as part of the filename. This happens even with SETDOS /X-4.
I understand that I can double the percent sign to escape the character. However, if I am running the file across a large number of files, I won't be able to double it manually. Is there any easy solution to this problem?
(By the way - other than parenthesis and percent characters - are there any other additional characters that you would consider "badly formed", even though they are legal filename characters?)





There's already a SETDOS /X option (/X-4) that will work with your example and your (sorry, but badly-formed) filenames.
 
OK, I think I found the solution. On the "Parameter Quoting" page, I noticed that back quotes prevent all variable expansion within quoted text.

So, in my case, where I have a batch file (makeshortcut.btm) that does this:
shortcut "%1$" "" "" "" "%1$.lnk" 1
I can't run:
makeshortcut "filewith%sign"
But I can successfully run:
makeshortcut `filewith%sign`

Similarly, this also solves the problem with filenames which begin with an open parenthesis. That is, even with SETDOS set to X+4, I can run:
makeshortcut `(filewithinitialparenthesis`

A few questions to all you knowledgeable users (and to Rex):
1] It seems to me that it would be best to *always* use backquotes when specifying filenames on the command line, because doing so takes care of the special cases noted here, while regular quotes do not. Is there any downside to using backquotes across the board whenever passing filenames as parameters?
2] With regard to the specific case of files with an initial open-parenthesis, is there any advantage to using SETDOS /X-4 rather than simply using back quotes?
3] Overall, it might make sense for TCC's command line autocompletion to automatically add back-quotes around files which contain percent signs or parentheses, just as it currently puts regular quotes around filenames containing spaces?






OK, SETDOS /X-4 does solve the problem here. So I can have my BTM files each do a SETDOS /X-4 on the first line.
But what about filenames that contain a percent sign?
Here again, although you may call them "badly formed", they are legal. In fact, in a quick search, I found that I have over 600 files on my computer which contain percent signs within them. Among the items: directories and files belonging to the "Alcohol 120%" product; downloaded files in which all non-standard chars have been converted to URL Encoding (e.g. every space is represented by %20); logfiles from the Windows event viewer, which contain percent signs as a standard part of the filename; and many more. I think this is sufficient to demonstrate that the odds are high that on any given system there are at least a few filenames with percent signs (I would venture that there are many such files on your own system as well!)
So, what can we do about these filenames? Here SETDOS /X-4 does not do the trick.
My .BTM file contains the following line:
shortcut "%1$" "" "" "" "%1$.lnk" 1
But when I run it on a filename which contains a percent sign, all characters after the percent sign are left out (until the period). Worse, if the chars after the percent sign happen to match an environment variable, the variable is expanded fully as part of the filename. This happens even with SETDOS /X-4.
I understand that I can double the percent sign to escape the character. However, if I am running the file across a large number of files, I won't be able to double it manually. Is there any easy solution to this problem?
(By the way - other than parenthesis and percent characters - are there any other additional characters that you would consider "badly formed", even though they are legal filename characters?)
 
Wait, there a problem. Filenames can also contain a backtik - that's also a legal character in windows. And if my filename has a backtik, I can't put backtiks around it ("TCC: no closing quote" error).
So, is there any way to pass a filename as a parameter such that TCC will be able to process all legal filenames?

OK, SETDOS /X-4 does solve the problem here. So I can have my BTM files each do a SETDOS /X-4 on the first line.
But what about filenames that contain a percent sign?
Here again, although you may call them "badly formed", they are legal. In fact, in a quick search, I found that I have over 600 files on my computer which contain percent signs within them. Among the items: directories and files belonging to the "Alcohol 120%" product; downloaded files in which all non-standard chars have been converted to URL Encoding (e.g. every space is represented by %20); logfiles from the Windows event viewer, which contain percent signs as a standard part of the filename; and many more. I think this is sufficient to demonstrate that the odds are high that on any given system there are at least a few filenames with percent signs (I would venture that there are many such files on your own system as well!)
So, what can we do about these filenames? Here SETDOS /X-4 does not do the trick.
My .BTM file contains the following line:
shortcut "%1$" "" "" "" "%1$.lnk" 1
But when I run it on a filename which contains a percent sign, all characters after the percent sign are left out (until the period). Worse, if the chars after the percent sign happen to match an environment variable, the variable is expanded fully as part of the filename. This happens even with SETDOS /X-4.
I understand that I can double the percent sign to escape the character. However, if I am running the file across a large number of files, I won't be able to double it manually. Is there any easy solution to this problem?
(By the way - other than parenthesis and percent characters - are there any other additional characters that you would consider "badly formed", even though they are legal filename characters?)
 
The basic issue is that the syntax TCC inherited its basic syntax from COMMAND.COM (and CMD.EXE).
1/ All the special characters used in TCC for syntactic notation were ILLEGAL in filenames. Now they are legal, causing havoc.
2/ The language does not distinguish between constant strings and strings that are the values of variables or expressions. PL/1 was a language where context allowed using language keywords as variables or routine labels without confusion. Not so here. The value of a variable or a function can be a command. In fact I use that feature to automatically load global command and directory history, aliases and functions from the files in which the last previous termination of SHRALIAS.EXE saved them:
for %x in (*.sav) ( %@name[%x] /r %x %+ echo Loaded %x >> usage.log )
For these reasons it is impossible to maintain even a semblance of backward compatibility at the same time as creating a language that can handle all characters legal in NTFS file names (or arbitrary text strings, for that matter).

If your filenames use only characters that have ASCII (even though in the NTFS directory they are cataloged in Unicode), check out Charles Dye's SafeChars utility - it replaces the ASCII of characters syntactically significant in TCC with alternate Unicode characters which have the same glyph as the original, and which are converted back to the originals when command output is displayed in ASCII. It MAY do the job you want, but there is no guarantee.

BTW, when you quote something you respond to, if the quotation is after the response, people read the response first, even before seeing whatever you quoted, so it may look out of context until they get to the quotation. AFAIK you are the only one here who postquotes.
 
1] It seems to me that it would be best to *always* use backquotes when specifying filenames on the command line, because doing so takes care of the special cases noted here, while regular quotes do not. Is there any downside to using backquotes across the board whenever passing filenames as parameters?

That wouldn't be too popular, as it would make it impossible to pass a variable as a parameter.
 
Wait, there a problem. Filenames can also contain a backtik - that's also a legal character in windows. And if my filename has a backtik, I can't put backtiks around it ("TCC: no closing quote" error). So, is there any way to pass a filename as a parameter such that TCC will be able to process all legal filenames?

First, some of those characters are NOT legal; it's just that Windows doesn't actually enforce the documented rules against them. That doesn't make them "legal", and it certainly doesn't make it smart to use them.

Second, you're going to have the same problems with CMD (or worse, since CMD lacks the workarounds that TCC has).

Third, the only way to allow every imaginable idiotic filename would be to disallow all of the potentially overloadable characters, which means no variable expansion, command grouping, character escaping, control characters (I've seen filenames with ASCII characters < 32 in them, which *really* gets interesting when you try to access them), multiple commands, compound commands, redirection, yada yada yada. That doesn't seem like a worthwhile tradeoff in order to let a few dimwitted users generate a few malformed filenames (particularly since the naming convention serves no useful purpose).

You're working with a command interpreter, which has to guess which of potentially multiple meanings it should apply to a character when it encounters it. Even the DWIM parser can't handle your desired syntax; you're going to have to wait for the DWIM ESP parser, because the parser has to already know at an early stage what it's going to be doing 20 steps later.

In the meantime, try Charle's SafeChars plugin.
 
OK, SETDOS /X-4 does solve the problem here. So I can have my BTM files each do a SETDOS /X-4 on the first line.
But what about filenames that contain a percent sign?
Here again, although you may call them "badly formed", they are legal. In fact, in a quick search, I found that I have over 600 files on my computer which contain percent signs within them. Among the items: directories and files belonging to the "Alcohol 120%" product; downloaded files in which all non-standard chars have been converted to URL Encoding (e.g. every space is represented by %20); logfiles from the Windows event viewer, which contain percent signs as a standard part of the filename; and many more. I think this is sufficient to demonstrate that the odds are high that on any given system there are at least a few filenames with percent signs (I would venture that there are many such files on your own system as well!)
So, what can we do about these filenames? Here SETDOS /X-4 does not do the trick.

I have 0 of those filenames & directories on my system, because I immediately rename (or remove) them when they appear. And if I can find the nitwit who created them, I shoot him in the head with my Nerf cannon.

"SETDOS /X-3" will disable all variable expansion.
 
In the meantime, try Charle's SafeChars plugin.

FixNames is more likely to be helpful for files containing backticks, percent signs, carets, and so on. It renames the offending files without passing their names through the parser.

(Rex, do you have any brilliant ideas for renaming files containing flat-out illegal characters like / : ? * < > and so on? I had what I thought was a great approach and got it 99% working -- only to discover that using NtSetInformationFile() to rename a file doesn't frakkin' work with files opened as FILE_OPEN_BY_FILE_ID, a completely undocumented limitation. Damn Microsoft!)
 
(Rex, do you have any brilliant ideas for renaming files containing flat-out illegal characters like / : ? * < > and so on? I had what I thought was a great approach and got it 99% working -- only to discover that using NtSetInformationFile() to rename a file doesn't frakkin' work with files opened as FILE_OPEN_BY_FILE_ID, a completely undocumented limitation. Damn Microsoft!)

I find that the only simple solution is "format". Barring that, I usually boot to a rescue disk and use a sector editor. I've never found a way to do it programmatically in Windows.
 
Rex, thinking out loud here, I'm wondering if the parser could handle a new syntax for raw strings a la python, where. i.e., r"%t" yields quote,per-cent,t,quote while "%t" yields quote,contents of variable t,quote (OK, maybe not r" but something else that won't potentially break existing scripts). The advantage of this over SETDOS is that raw and ordinary strings could be mixed in the same command line, plus this syntax is easier to remember and use. One disadvantage is that surrounding quotes always get included in the string, so maybe special parsing would be needed for %@UNQUOTES[r"%t"] to yield per-cent,t. Moreover, I think that SET x=r"%t" should be modified internally to mark the value of x as being a raw string, so that, i.e., ECHO %x outputs quote,per-cent,t,quote instead of the quoted value of variable t. Possibly a few more loopholes remain to be discovered...
 
That's an interesting idea. I can see two other potential problems:

1) Finding a character that Avi doesn't already have in one of his misshapen filenames. (Probably have to pick an unlikely Unicode character and require everyone to use Unicode fonts.)

2) Determining exactly when the special character would get removed. (As we see when trying to use %'s or escapes when a command line is passed repeatedly through the parser, such as FOR or DO.)

And it wouldn't solve Avi's original problem, which was due to nested variable expansion, not a literal string containing troublesome characters. But I think it's still worth exploring to help solve the problems with literals.
 
Finding a character that Avi doesn't already have in one of his misshapen filenames. (Probably have to pick an unlikely Unicode character and require everyone to use Unicode fonts.)

Well, there are the chars that Windows does enforce (<>:|?/\*"). So we could choose one of those (say, the colon), and then say that anytime a colon precedes a quoted string the string should be treated as literal:
SET x=:"%t"
I think this would be better than choosing an unlikely Unicode character, because, as unlikely as the Unicode character may be, it is still possible that one day we'll have to process such a file. On the other hand, the restriction on the colon is fairly well-enforced.

Determining exactly when the special character would get removed. (As we see when trying to use %'s or escapes when a command line is passed repeatedly through the parser, such as FOR or DO.)

I would suggest that it work exactly the way backtiks work. Currently, I have found that when I specify a filename with backtiks, my BTM files can then work excellently with the specified filename, passing it from parameter to parameter and from function to function. The only problem occurs, as noted, when files actually have backticks in them. So, I'd say that if we can get Stefano's suggestion working with the same handling as backtiks, all will be good, since it will cover *all* Windows filenames, including those with backtiks.

And it wouldn't solve Avi's original problem, which was due to nested variable expansion, not a literal string containing troublesome characters. But I think it's still worth exploring to help solve the problems with literals.

Actually, I think it would solve the problem. My main problem is being able to specify a filename on the command line such that it can be referenced in a BTM file as %1 etc., and such that it would work with all filenames that Windows allows. If I can specify the filenames with a prefixed colon and then quotation marks, and if, in such cases, the effect of the colon+quotes would be the same effect as the backtiks currently have, then the problem would be solved, and we will be able to confidently run batch files across all existing Windows files.
 
The raw string switch should be something that 1) uniquely identifies a raw string and can't get confused for another construct; 2) is pretty (so people will think it's cool instead of ugly); 3) is quick (we're all lazy, right?); 4) applies to internal commands only, in any context they get used (not just filenames).

When Avi proposed the colon, as in :"string", at first I was concerned that existing command option syntax, such as some INTERNAL_COMMAND /O:"string" would prevent interpreting :"string" as a raw string. However, from a cursory review of TCC's help file, it seems to me that TCC consistently uses the option syntax /O:letters when just letters can follow the colon character,and /T"word word" without a column when multiple words are involved. This strengthens the case for choosing the colon.

But maybe a single character isn't the right answer, let's think about it some more.
 
Well, there are the chars that Windows does enforce (<>:|?/\*"). So we could choose one of those (say, the colon), and then say that anytime a colon precedes a quoted string the string should be treated as literal:
SET x=:"%t"

I don't think that's a good idea -- a lot of people use things like c:"%var" to add a path/filename onto a drive specifier.
 
I don't think that's a good idea -- a lot of people use things like c:"%var" to add a path/filename onto a drive specifier.

True - but nobody *starts* a parameter or filename path with a colon. The proposal relates only to the use of a colon+quote sequence to start a parameter. Any colons within the sequence would be treated as part of the literal.
 
Avi, are you thinking that a blank space before a column marks the *start* of a parameter? Not always true, consider %@len[:"string"], how would you want to interpret that? And there's no blank in front of that colon.
 
Well, you don't seem to be doing anything with guillemets yet....

... but of course, every new character that you assign a special meaning to is one more character that causes problems when it appears in a filename.
 
You could do something like you currently do for regular expressions - e.g. double up the reserved chars. Or use that weird \\?\ kind of syntax MS uses.
 
but of course, every new character that you assign a special meaning to is one more character that causes problems when it appears in a filename.
Yes, of course; that's why I suggested using one of those few characters that Windows actually enforces as being illegal chars in filenames, such as the colon.
 
Yes, of course; that's why I suggested using one of those few characters that Windows actually enforces as being illegal chars in filenames, such as the colon.

Sure; but I'm guessing that literal strings would be useful for a lot of other things besides filenames.
 
How about a new SETDOS /X option (B?) to turn off doing anything inside double quotes?
 
Back
Top
[FOX] Ultimate Translator
Translate