1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

How to? How do I "un-" SafeChar???

Discussion in 'Plugins' started by mathewsdw, Jan 18, 2012.

  1. mathewsdw

    Joined:
    May 24, 2010
    Messages:
    855
    Likes Received:
    0
    Here's the problem (and I really don't understand why I've never run into this before) - both files and directories created by me (because I really think these characters make the name(s) of the files/directories more "descriptive" which I really need because of my poor memory) and files I regularly download from a particular web site contain many "unsafe" characters - particularly open and close parenthesis and the ampersand. Well, generally speaking, when I am executing code that processes file and directory names I use one of the "@Safe..." functions to get around what could be a significant problem, and everything works fine. However, when I try in some way to deal with one of these "Safe" names with the files and directories as they actually exist in the file system, the "safe" name no longer matches the (actual) "unsafe" names that the file actually have, so that things like "@FileSize[...]" no longer work (although it does work for files whose names contain ampersands without using a "@Safe..." function if the file name is enclosed in double-quotes). Following is a simple example to illustrate exactly what I mean:

    First, an existing file:
    Code:
    [Z:\]dir /K /M *paren*
    1/18/2012  20:24              0  This File Has a (Parenthetical) Name
    
    Now, a very simple batch file named "SafeFileNameMatchTest.btm" that completely illustrates the problem:
    Code:
    @Echo "%@ExecStr[PDir /(fn) Z:\*paren*]"
    @Echo %@FileSize["%@ExecStr[PDir /(fn) Z:\*paren*]"]
    @Echo "%@SafeExp[@ExecStr[PDir /(fn) Z:\*paren*]]"
    @Echo %@FileSize["%@SafeExp[@ExecStr[PDir /(fn) Z:\*paren*]]"]
    
    And, finally, the (not really surprising) results of executing the above batch file:
    Code:
    [Z:\]SafeFileNameMatchTest
    "This File Has a (Parenthetical) Name"
    0
    "This File Has a (Parenthetical) Name"
    -1
    
    As you can easily see, and not at all surprisingly, the "Safe'd" name no longer matches the real file name so that the "@FileSize" function fails. I haven't really deeply thought about this as of yet, but it strikes me that some "@UnSafe" functions would be the best (if not only) way to work around the problem.

    Or is there an already a way to accomplish this (actually relatively simple in principle) task?

    I will add here that I did already try to "Unsafe /D:" the open and close parenthesis and, for reasons I really didn't understand at the time and have now forgotten (bad memory as usual), that didn't work, and I really didn't "investigate" that very much because that "solution" wouldn't work, as far as I know, for the ampersand anyway.
     
  2. Charles Dye

    Charles Dye Super Moderator
    Staff Member

    Joined:
    May 20, 2008
    Messages:
    3,287
    Likes Received:
    39
    I don't have any simple solution for you -- if you replace characters in a filename, you're going to have a different name. But I don't really see why you'd need to do that anyway, at least not for ampersands and parentheses. Neither should cause any problem, so long as the filename is quoted. (Percent signs, on the other hand....)
     
  3. thedave

    Joined:
    Nov 13, 2008
    Messages:
    254
    Likes Received:
    2
    I wonder if adding a @SAFEOUT might make sense, this would take a previously processed "safe" string and convert it back to the potentially dangerous characters, escaping each potentially dangerous character with a %=

    So for example:
    Code:
    >set x=hello^&dir
    >echo %@safeexp[x]
    hello&dir
     
    >echo %safeout[%@safeexp[x]]
    hello%=&dir
    So basically you'd get one single use of the original/dangerous string (but you couldn't safely store it in an environment variable and use it from there)

    Or maybe there's a better way?
     
  4. mathewsdw

    Joined:
    May 24, 2010
    Messages:
    855
    Likes Received:
    0
    Thank you Charles, you told me kind of what I expected you to tell me. While below makes valid points, I think I've decided that no "@Safe" anything can help me in this situation so I'm moving the "discussion" to the "Support" Forum, where, if you go there, you will see the specific issues I'm trying to address in a basically "real world" example.

    But for a somewhat "simplified" and "made up" example of the problem I'm trying to "get around", suppose I have two files:
    Code:
    [Z:\]dir *.test /K /M
    1/19/2012  0:10              0  A & B & C.test
    1/19/2012  0:11              0  No Ampersands.test
    
    Now, suppose I have a very simple batch file that looks like this:
    Code:
    @Echo Off
    SetLocal
    SetArray List[10]
    Echo >NUL: %@ExecArray[List,Dir /K /M *.test]
    Set I=0
    Do While %I LT %_ExecArray
      @Echo %List[%I]
      Set /A I+=1
    EndDo
    UnSetArray List
    EndLocal
    Quit 0
    
    Running this batch file as-is produces (not at all surprisingly):
    Code:
    [Z:\]SafeCharFileNameTests
    1/19/2012  0:10              0  A
    TCC: Z:\SafeCharFileNameTests.btm [7]  Unknown command "B"
    TCC: Z:\SafeCharFileNameTests.btm [7]  Unknown command "C.test"
    1/19/2012  0:11              0  No Ampersands.test
    
    Well, there is a simple and obvious solution: change the line @Echo %List[%I] to @Echo "%List[%I]", and this works just fine producing:
    Code:
    " 1/19/2012  0:10              0  A & B & C.test"
    " 1/19/2012  0:11              0  No Ampersands.test"
    
    As would be obviously expected, wrapping the lines in double-quotes completely eliminates the the problem but has on very-unfortunate consequence in that if the line already contains double quotes, that simply doesn't work in all cases. To bring this down to the simplest possible examples, suppose we have a two-line file "Demo.btm":
    Code:
    @Set Line=%@Line[con,0]
    @Echo %Line
    
    If we then do this:
    Code:
    [Z:\]Demo
    abc def
    
    We get
    Code:
    abc def
    
    Just what we would expect.
    However, if we do this:
    Code:
    [Z:\]Demo
    abc & def
    
    we get:
    Code:
    TCC: Z:\demo.btm [1]  Unknown command "def"
    abc
    
    Also as we would unfortunately expect.

    So if we change the Demo.btm file to:
    Code:
    @Set Line="%@Line[con,0]"
    @Echo %Line
    
    an run it in the first case, we get:
    Code:
    [Z:\]Demo
    abc def
    "abc def"
    
    And the second case:
    Code:
    [Z:\]Demo
    abc & def
    "abc & def"
    
    Looks good, until we do this:
    Code:
    [Z:\]Demo
    This is a quoted string containing an ampersand: "An & just like this" results.
    TCC: Z:\demo.btm [1]  Unknown command "just"
    "And supply a quoted string containing an ampersand: "An
    
    So you have a choice, a technique that can properly handle doubly-quoted ampersands but can not handle ampersands not contained in double quotes, or a technique that can handle ampersands that are double-quoted but can not handle ampersands that are not contained in double quotes. However, if you are reading data that contains both situations, the only possible solution that I can see is something like:
    Code:
    @Set Line=%@Replace[^&,_anampersand_,%@Line[con,0]]
    @Echo %@Replace[_anampersand_,^&,%Line]
    
    which when run like this:
    Code:
    [Z:\]Demo
    This is a quoted string containing an ampersand: "An & just like this" results.
    
    which produces:
    Code:
    This is a quoted string containing an ampersand: "An & just like this" results.
    
    which while rather ugly and inconvenient, appears to work. But what if you have data that also contains "or" bars ("|") as well as ampersands? For example, the above batch file produces:
    Code:
    TCC: Z:\demo.btm [1]  Unknown command "bar"
    This is an unquoted
    
    if run like this:
    Code:
    [Z:\]Demo
    This is an unquoted | bar and quoted: "An & just like this" results.
    
    (I honestly don't know what happened to the data after the ampersand, and I'm really to lazy to think about it that much.)

    So, if in attempt "fix" this, we change the batch file to:
    Code:
    @Set Line=%@Replace[^|,_anorbar_,%@Replace[^&,_anampersand_,%@Line[con,0]]]
    @Echo %@Replace[_anampersand_,^&,%@Replace[_anorbar_,^|,%Line]]
    
    we get this:
    Code:
    This is an unquoted | bar and quoted: "An & just like this" results.
    TCC: Z:\demo.btm [2]  Unknown command "bar"
    
    not really all that surprising if you think about it.

    And:
    Code:
    @Set Line=%@Replace[^&,_anampersad_,%@Replace[^,_anorbar_,%@Line[con,0]]]
    @Echo %@Replace[_anorbar_,^|,%@Replace[_anampersand_,^&,%Line]]
    
    produces:
    Code:
    [Z:\]Demo
    This is an unquoted | bar and quoted: "An & just like this" results.
    TCC: (Sys) Z:\demo.btm [1]  The parameter is incorrect.
    "%@Replace[,_anorbar_]"
    TCC: Z:\demo.btm [2]  Unknown command "bar"
    
    And there is no "ordering" of the "@Replace" functions that "solves" the problem (believe me, I've tried every possibility, but that was the result I expected even before I tried.)

    I might have to write an "@UnSafe" series of routines myself (a C++ program) if I want to solve this problem myself (I was hoping you would be willing to do it because you've already written all of the "ground-level" code), because the only other solution I can see is to do what I'm ultimately trying to do in a C++ program from the "ground up". But I really don't want to do that either because the general "parsing" capabilities of TCC far exceed those that are part of the "standard" C++ libraries (TCC's "parsing" capabilities are probably one of its greatest "strengths".)

    - Dan

    P. S. I will add here that I'm trying to "automate" a process that I absolutely must do and that would take me literally days to do if I have to do it by hand; and that's ignoring the high probability that I will make significant mistakes along the way if I try to do it "manually"; and it is that fact, more than anything else, that causes me to write programs of one kind or another in the first place. "Automating" something may actually take longer than doing it by hand, but once "automated" I can be confident that it will be done "accurately", which can not at all be said if I try to do it "manually".
     
  5. Charles Dye

    Charles Dye Super Moderator
    Staff Member

    Joined:
    May 20, 2008
    Messages:
    3,287
    Likes Received:
    39
    Well, adding an @UNSAFE function to return an un-remapped version of a string would be trivial. What would be difficult is figuring out how to use it, since it could by definition return problematic characters. I would have added such a function ages ago, except I can't figure out how it would be useful. I'll add it if you really really want it, but I can't tell you how to use it!

    (Perhaps instead a command which writes an un-safed string into an environment variable? But then I don't see how you'd use that without falling back on SETDOS /X. And avoiding the need for SETDOS is pretty much the entire goal of the plugin....)

    I'm still not clear whether you want this for filenames, or for arbitrary text. SafeChars is useful for the latter, but probably not the former. If you need to iterate over problematic filenames, I wouldn't recommend a SafeChars type approach at all. Instead, I'd use @FindFirst / @FindNext / @FindClose, always quoting the filenames as you receive them (i.e. putting double quotes around the @FindFirst and @FindNext functions.) From there you can write them into an array using SET if you like; though that limits the number of filenames you can handle to the size of the array.
     
  6. mathewsdw

    Joined:
    May 24, 2010
    Messages:
    855
    Likes Received:
    0
    Charles, I want to do it for file names that are arbitrary text because the file names are being read from a file so that "@FindFirst" and "@FindNext" are not at all applicable. However, I do want them to "end up" as actual, valid, file names because they are ultimately going to be used to process files that exist in the file system.

    However, the good news is that it is likely that I can to do want I want to do without getting involved with the "@Safe..." functions at all. I'm still in the process of "experimenting" with this, and while the ultimate syntax of the "Set" statement is very strange regarding the use of double quotes, things seem to be working up to this point.

    Specifically:
    Code:
    [Z:\]Set Line1=File "Me & Kathy Lee" not found in
    [Z:\]Set Line2= "G:\This is a directory name with an & in it"
    [Z:\]Set Command=%@Left[-1,%@Right[-1,%Line2]]\%@Right[-6,%@Left[-14,%Line1]]"
    [Z:\]Echo %Command
    "G:\This is a directory name with an & in it\Me & Kathy Lee"
    
    Now, if I can just create another expression whose result is the "G:" being replaced by a "D:" (because the ultimate goal is to "produce" @ExecStr[Copy "D:\directory name\file name" "G:\directory name\file name"]), but I tend to think at this moment that the "solution" to that will be rather trivial at this point and will basically be able to handle any character contained in the double-quotes without any limitations whatsoever, which would be the ultimate goal. But thank you very much.

    - Dan
     
  7. mathewsdw

    Joined:
    May 24, 2010
    Messages:
    855
    Likes Received:
    0
    Charles, this is an update to the previous comment I made above because I rather strongly disagree with one of your contentions in the last posting you made in this thread: that an "@UnSafeExp" function, for example, would be totally useless. First off, I will fully acknowledge that you are absolutely correct in probably the vast majority of situations; but that "vast majority" does not include the possibly rather often occurring one case where this would be useful (and where I needed it in the first place): "UnSafe-ing" an expression inside of a pair of double quotes as in "restoring" a file name back to its real-world format so that it both "matches" the original file name that it could have been derived from and for "generating" a "new" file name based on an existing file name (such as containing "version" information, for example) absolutely would be useful in possibly many situations. I want to repeat: I did avoid the need for the whole "@Safe" "system" entirely (again, a very good system on the whole) , but I wouldn't exactly call what I had to do to avoid that need completely trivial. And that fact makes this not a real high priority even for me. But, on the other hand, if it's relatively easy to do (which I tend to believe would be the case), why not do it?

    - Dan

    P. S. When I was a mainframe assembly-language programmer many years ago, the (32-bit) IBM mainframes that I used at the time had a machine instruction named "TRT" ("Translate and Test"). This instruction went through every byte in the first operand and used its value as an offset into a table that was supplied by the second operand and replaced that byte in the first operand by whatever was in the table supplied by the second operand at its offset into the table, and if a "translated" byte had a value of zero, it set the condition code to indicate that that had happened. Now, there is no doubt that that was for a 256-byte character set meaning a 256-byte "table", and in today's world that would be a 131,072-byte table (256 times more characters at 2 bytes per character in the table vs. one bye in the table for the 8-bit situation), a vastly larger (512 times) table. However, in machines today with gigabytes of RAM (this rather low-end machine has about 4.3 gigabytes of RAM, and it's a 32-bit processor; and I have enough RAM to have a 384M byte RAM disk with absolutely no obvious "performance" penalty; and since it's possibly 1000's of times faster than a physical disk drive (it saves itself to a physical hard disk, always in exactly the same file so "fragmentation" is a total non-issue, on system "shutdown", and that file is basically the same size as the RAM disk, and reloads itself on boot-up from the same file) it has virtually no "disadvantages" at all) so a 128K table is quite trivial. And, in terms of populating a table of that size, I'm quite sure an algorithm would easily exist to fill it with "default" values, and another, very simple algorithm (going through one list of characters (2 bytes at a time) that is a list of offsets into the table while concurrently "picking up" a (2-byte) value from the same position in the other list and placing that 2-byte value at the offset into the table provided by the current 2-bytes in the first operand would actually be quite trivial for those entries in the table(s) that you don't want to use the "default" values for. And if you don't expect to have to use the "full" 65,536 byte characters in the 16-bit character set, there are a number of fairly trivial ways you could reduce the size of the of the actual table, but I really tend to doubt that doing that would be at all useful. (You would need two tables to "translate" in both "directions" for a total of 256K bytes of "table", again a not very large number (0.0000055% of available memory for this machine by my calculations) in a processor with 4.3G bytes of memory. Again, while I no longer have a great need for this because I've figured out how to do it "on my own", I still think having such a function would be useful.

    And in terms of what that (single!) function would be, simply "@UnSafeExp" would probably be an appropriate name and unlike in the case for the "@SafeExp" function a percent sign on the parameter could be required if it is, in fact, a variable. Supplying the actual value (rather than the name) of the variable would not at all be a problem and could certainly be useful in some circumstances. However, not supplying the percent sign might be desirable in terms of maintaining "consistency" with the existing "@Safe" functions, although I, personally somewhat prefer the first option because it would allow more "flexibility" in terms of "constructing" the desired end result. I will add, however, that I don't think making the percent sign "optional" would be at all a good idea.

    As I said, not really required but it would be nice.

    P. P. S. I wrote assembly-language code for a large number of different processors. These were, in "historical" order, the IBM 1130 (a mini-computer then, not even a micro-computer by today's standards), a Control Data Corporation 6600 (a so-called "super computer" then; I wouldn't be surprised if it wasn't much if any faster (or even slower) than my laptop), a Digital Equipment Corporation PDP ("Peripheral Data Processor" as I remember) 11 (a 16-bit minicomputer), a "Nixdorf' minicomputer (produced by an I'm rather sure no longer existent German company - it ran an extreme subset of the IBM 360 instruction set - no registers or "binary" instructions at all), multiple IBM 370-series (and "XA", I believe they were called) mainframes, the 6502 16-bit microprocessor produced for the no longer existent company "MOS Technology" used in the original Apple II, the Motorola 66000 line of microprocessors (used in the Amiga), and the Intel (and AMD) line starting with the 8008 all of the way up to the Pentiums. And, as what I believe is a somewhat humorous aside, my older brother had a computer with an Intel 8008 microcomputer (an essentially 8-bit processor as I remember; this was even before the 16-bit 8080) that could only be programmed by entering the instructions using levers on the front panel of the computer or by reading a paper tape (after loading a program to read the paper tape using the front-panel "levers" - no read-only memory then). Well, while I was at his house one afternoon I wrote, entered (on the front panel), and tested a program that could read bytes into memory from the paper tape reader and save bytes by punching the paper tape. And this program was quite short, to start, because what I initially entered was just enough to allow it to read the rest of "itself" from the paper tape, and this program worked flawlessly. However, my brother was actually somewhat angry at me because he had wanted some "involvement" with the program, "debugging" and correcting it if nothing else, and I had "delivered" a totally flawless end-"product". (And this was the only "useful" program he could even think of "writing" for the machine at that time.)
     
  8. Charles Dye

    Charles Dye Super Moderator
    Staff Member

    Joined:
    May 20, 2008
    Messages:
    3,287
    Likes Received:
    39
    Because if I provided it, people would use it.... Oh, all right. I've uploaded a new build, with one new command and one new function. Time to implement @UNSAFE: about 60 seconds. Time to document @UNSAFE: over ten minutes.

    I too had hours of joy on that one. Wrote my own DOS wedge for my beloved C128, and burned it into EPROM (so it didn't take up RAM and didn't have to be loaded from disk.) I can't claim it ran perfectly on the first try, but after a few iterations it became a pretty useful tool. In fact I have a MOS Technologies Kim-I nailed to the wall over my desk right now, with a pretty white ceramic 6502. No, I've never even tried to program that beast!
     
  9. mathewsdw

    Joined:
    May 24, 2010
    Messages:
    855
    Likes Received:
    0
    Thank you, Charles! :) And the time you took to accomplish that task is about what I expected it you take you to do it!!! (Whereas it would have taken me a lot longer because I would be starting out from ground zero.)

    - Dan
     

Share This Page