By registering with us, you'll be able to discuss, share and private message with other members of our community.
SignUp Now!That was in a TCC console. Here (below) it is in a TCMD tab. It's the same except for the choice of the unprintable character.Is this in a TCMD tab window or a Windows console window?
No. That rather makes a mess of everything!And does Option //UnicodeOutput=Yes work around the issue? Because I suspect this is the same issue as "HEAD" mangles stream encoding.
As I said in Charles Dye's thread about HEAD:I've mentioned a few dozen times in the past that you will never, ever be satisfied with the results if you convert extended ASCII characters to Unicode and then back to ASCII. (Or even worse, back and forth and back and forth like in your examples.) That's the way Windows works; what you get will depend on your codepage and your font, but it will almost never be what you want. If you're unhappy with the results you should be using UTF-16 or UTF-8 files. Or at the very least, UnicodeOutput or UTF8Output, and/or change your code page to 65001.
I have a 256-byte file (0255.bin) containing the bytes 0x0 through 0xFF. I wrote a test app to read that file into a buffer, print the decimal values of the bytes in the buffer, use MultiByteToWideChar followed by WideCharToMultiByte (with lpDefaultChar equal to NULL) on the buffer, then print the decimal values again.
I did that for the ANSI, OEM, and THREAD code pages.
In all three cases, the before/after decimal values were identical; i.e., the decimals 0 through 255.
Yup! Using codepage 866 and piping to HEAD or TAIL clobbers the entire Cyrillic alphabet, uppercase and lowercase. HEAD and TAIL just seem to ignore codepages altogether (I have no idea why). In contrast, piping to TPIPE seems to respect the current codepage.Pretty sure this is the same as AnrDaemon's issue. When HEAD/TAIL read from a pipe, they seem to assume that everything is Unicode. If you have 8-bit text, it becomes 8-bit Unicode — bytes zero-extended to words. So your 0xB1 becomes U+00B1, the plus-minus sign.
Yup! Using codepage 866 and piping to HEAD or TAIL clobbers the entire Cyrillic alphabet, uppercase and lowercase. HEAD and TAIL just seem to ignore codepages altogether (I have no idea why). In contrast, piping to TPIPE seems to respect the current codepage.
|!
pseudopipe prevents the problem. I think there's a missing MultiByteToWideChar() somewhere.