Welcome!

By registering with us, you'll be able to discuss, share and private message with other members of our community.

SignUp Now!

WAD A bit of strangeness related to Unicode-marked file not being Unicode

May
855
0
I issued the following command with the "Unicode" option checked:
Code:
PDir * /Nj /a-d /s /(dy/m/d th:m:s fpn) >Z-Drive Files.V2012-02-18.txt
Here is what the "List Z-Drive Files.V2012-02-18.txt" command shows in hex ("x" option):
Code:
0000 0000 ff fe 32 30 31 32 2f 30  32 2f 31 37 20 32 33 3a   ■2012/02/17 23:
0000 0010 35 38 3a 32 38 20 5a 3a  5c 56 69 6e 79 6c 20 43  58:28 Z:\Vinyl C
I don't really think that I have to point out that the "Unicode marker" (0xfffe) is there but the remainder of the file is not Unicode.

And the output is "perfectly normal" when issuing the above command when the "Unicode Output" option is not selected:
Code:
0000 0000 32 30 31 32 2f 30 32 2f  31 37 20 32 33 3a 35 38  2012/02/17 23:58
0000 0010 3a 32 38 20 5a 3a 5c 56  69 6e 79 6c 20 43 61 66  :28 Z:\Vinyl Caf
I'm not absolutely sure at this point that the "Unicode Output" option was checked when I issued the first "PDir" command above, although I am about 90% sure that it, in fact, was, although I don't think it really matters all that much given the below.

Strangely enough the commands "Type Z-Drive Files.V2012-02-18.txt | List" displays output that is "valid" Unicode (although the "list" command did not at least show the 0xfeff marker) when the "Unicode Output" option is selected:
Code:
0000 0000 32 00 30 00 31 00 32 00  2f 00 30 00 32 00 2f 00  2 0 1 2 / 0 2 /
0000 0010 31 00 37 00 20 00 32 00  33 00 3a 00 35 00 38 00  1 7   2 3 : 5 8
And typing Z-Drive Files.V2012-02-18.txt to a file using redirection produces a file that is completely "valid" Unicode (i.e., with the 0xfffe "marker") when the "Unicode Output" option is selected:
Code:
0000 0000 ff fe 32 00 30 00 31 00  32 00 2f 00 30 00 32 00  2 0 1 2 / 0 2
0000 0010 2f 00 31 00 37 00 20 00  32 00 33 00 3a 00 35 00  / 1 7   2 3 : 5
And typing the (incorrect) file to the "List" command produces:
Code:
0000 0000 32 30 31 32 2f 30 32 2f  31 37 20 32 33 3a 35 38  2012/02/17 23:58
0000 0010 3a 32 38 20 5a 3a 5c 56  69 6e 79 6c 20 43 61 66  :28 Z:\Vinyl Caf
which is only different from the "original" file in that the 0xfffe marker is missing.

And, finally, typing the "incorrect" file to the "List" command with the "Unicode Output" option unchecked produces a perfectly valid file in that the Unicode marker is now missing:
Code:
0000 0000 32 30 31 32 2f 30 32 2f  31 37 20 32 33 3a 35 38  2012/02/17 23:58
0000 0010 3a 32 38 20 5a 3a 5c 56  69 6e 79 6c 20 43 61 66  :28 Z:\Vinyl Caf
And typing the "incorrect" file to another file with the "Unicode Output" option" again unchecked produces a perfectly-valid file that is identical to the "incorrect" file other than the fact that the Unicode "marker" (0xfffe) is gone which means, of course, that listing it in "hex" mode produces exactly the same results as immediately above.

You might think from the above that it doesn't really matter if the Unicode marker is there on a file that is not really Unicode because the "Type" command (somewhat strangely, in my opinion) handles it "properly"(?) anyway. Well, while that certainly seems to be true for the "Type" command, is is absolutely not true for the "Find" command, which is how I discovered it. You see, the output of a "Find" command immediately following the PDir command was total garbage.

The truth is that this is not really a major problem in that I've found several ways to "work around" it, but I still think that it's a bug worth investigating.

- Dan
 
I issued the following command with the "Unicode" option checked:
Code:
PDir * /Nj /a-d /s /(dy/m/d th:m:s fpn) >Z-Drive Files.V2012-02-18.txt

I assume that's not really the command you issued, since you're missing the required double quotes and what you'd actually create is a file called "Z-Drive".

I think it's highly unlikely that there is a bug in the Unicode output code -- it's (very) heavily used by nearly everybody and the code itself has no way of writing the header without also writing everything else in Unicode. (And TCC doesn't know anything about redirected output to a file versus redirected output to a pipe; it's all exactly the same code.)

I tried all of your steps here and couldn't reproduce any problems.
 
Rex, I won't argue with you about "reproducing the problem" because I did something just slightly different and got the correct results; and now I'm not sure exactly what I did the first time to get those results. (I have the original file as it came out of a PDir command if you want me to send it to you to look at, and I really don't know how I would have made a file like that "manually" without a hex editor (which I have, somewhere, but it's probably been years since I last had a need to use it). But the "embedded white space" in the file name was just a transcription error because I don't see so good. (And do you really think that I made that all up?)

- Dan
 

Similar threads

Back
Top