Welcome!

By registering with us, you'll be able to discuss, share and private message with other members of our community.

SignUp Now!

WAD TCC: inconsistent character handling

May
13,044
176
The file in question is ASCII. The character in question is 0xB1(plus/minus). My console font is Consolas, which can handle that character. TCC handles that character rather inconsistently, displaying it in at least 4 different ways.

1580491043682.png
 
Is this in a TCMD tab window or a Windows console window?
That was in a TCC console. Here (below) it is in a TCMD tab. It's the same except for the choice of the unprintable character.

1580505154739.png
 
I've mentioned a few dozen times in the past that you will never, ever be satisfied with the results if you convert extended ASCII characters to Unicode and then back to ASCII. (Or even worse, back and forth and back and forth like in your examples.) That's the way Windows works; what you get will depend on your codepage and your font, but it will almost never be what you want. If you're unhappy with the results you should be using UTF-16 or UTF-8 files. Or at the very least, UnicodeOutput or UTF8Output, and/or change your code page to 65001.

In a TCC console window, Windows handles the character display - all TCC does is pass the character and it's up to Windows how it appears.
 
I've mentioned a few dozen times in the past that you will never, ever be satisfied with the results if you convert extended ASCII characters to Unicode and then back to ASCII. (Or even worse, back and forth and back and forth like in your examples.) That's the way Windows works; what you get will depend on your codepage and your font, but it will almost never be what you want. If you're unhappy with the results you should be using UTF-16 or UTF-8 files. Or at the very least, UnicodeOutput or UTF8Output, and/or change your code page to 65001.
As I said in Charles Dye's thread about HEAD:

I have a 256-byte file (0255.bin) containing the bytes 0x0 through 0xFF. I wrote a test app to read that file into a buffer, print the decimal values of the bytes in the buffer, use MultiByteToWideChar followed by WideCharToMultiByte (with lpDefaultChar equal to NULL) on the buffer, then print the decimal values again.

I did that for the ANSI, OEM, and THREAD code pages.

In all three cases, the before/after decimal values were identical; i.e., the decimals 0 through 255.

I later did the same (successfully) with CP 866.

This thread started when I simply TYPE'd the file (no pipes). Here's what I saw/see.

1580508717692.png
 
Let's start over.

PlusMinusSign is Unicode 0xB1; supported by Consolas; not in my code page (437).
MediumShade is Unicode 0x2592; supported by Consolas; 0xB1 in code page 437. (see it below)

Rex, please explain why/how these are different.

1580532167486.png
 
Pretty sure this is the same as AnrDaemon's issue. When HEAD/TAIL read from a pipe, they seem to assume that everything is Unicode. If you have 8-bit text, it becomes 8-bit Unicode — bytes zero-extended to words. So your 0xB1 becomes U+00B1, the plus-minus sign.
 
Pretty sure this is the same as AnrDaemon's issue. When HEAD/TAIL read from a pipe, they seem to assume that everything is Unicode. If you have 8-bit text, it becomes 8-bit Unicode — bytes zero-extended to words. So your 0xB1 becomes U+00B1, the plus-minus sign.
Yup! Using codepage 866 and piping to HEAD or TAIL clobbers the entire Cyrillic alphabet, uppercase and lowercase. HEAD and TAIL just seem to ignore codepages altogether (I have no idea why). In contrast, piping to TPIPE seems to respect the current codepage.
 
Yup! Using codepage 866 and piping to HEAD or TAIL clobbers the entire Cyrillic alphabet, uppercase and lowercase. HEAD and TAIL just seem to ignore codepages altogether (I have no idea why). In contrast, piping to TPIPE seems to respect the current codepage.

Only from a pipe, though. Using a |! pseudopipe prevents the problem. I think there's a missing MultiByteToWideChar() somewhere.
 
Back
Top
[FOX] Ultimate Translator
Translate