Fixed Using codepage 65001 (UTF-8) breaks non-ASCII characters

gschizas · Sep 5, 2016

When typing any non-US-English character, an empty character (probably 0x00-NUL) is displayed on the right of the typed character. It doesn't overwrite the underlying character (if any), but the string is displayed as if you were pressing `space` after each. The weirder part is that this only manifests itself when typing the non-US English character in two separate lines (which leads me to believe this is a problem with the output stream)

1. Open TCC 20 (either standalone or under Take Command).
2. Type `chcp 65001` to switch to the UTF-8 "codepage".
3. Use any non-US English keyboard (for my example, I'm using Greek, but I've done this with Russian and German keyboards).
4. Type any character that is not 32-127 (my example: τεστ - which means "test" in Greek) and press enter (or type `echo τεστ` and enter, to verify it's not the error stream that has the problem).
5. Repeat step 4.

Actual results:

Expected results:
(This worked with TCC 19)

Notes:

I've done this with empty settings (they were generated from scratch), just to make sure it wasn't some setting that messed things up.

mfarah · Sep 5, 2016

Reproduced here:

gschizas · Sep 5, 2016

BTW, isn't the version supposed to be 10.0.14393? Why does TCC go to the fallback version (6.3)?

rconn · Sep 5, 2016

This would be a Windows issue - TCC doesn't display the text when it's in console mode (that's done by conhost.exe).

gschizas · Sep 5, 2016

If it's a Windows issue, why does it work with TCC 19 and it doesn't work with TCC 20?

EDIT: This happens exactly the same under Take Command (I just wanted to isolate the problem).

rconn · Sep 5, 2016

gschizas said:
If it's a Windows issue, why does it work with TCC 19 and it doesn't work with TCC 20?

I keep telling people that Windows does not actually support UTF-8 (other than in a handful of conversion APIs), but nobody wants to listen ...

The reason it behaves differently in v19 vs. v20 is because v20 is using different APIs to fix a different problem with Take Command -- Windows uses different code pages in GUI windows and console windows, and v20 is going to great lengths to try to rationalize those differences.

The reason you're seeing blanks in the output is because TCC is querying Windows for the width of the characters, and Windows is returning "2". I could add a hack to check for codepage 65001 and always assume they're really single-width characters, though that will break Japanese / Korean / Chinese support.

The question I have for you is -- why are you using 65001? Do you think it will provide some benefit?

nikbackm · Sep 6, 2016

rconn said:
The question I have for you is -- why are you using 65001? Do you think it will provide some benefit?

I use 65001 (but only temporarily, not persistently in a TCC session) when outputting UTF-8 text. Have worked fine so far.

gschizas · Sep 6, 2016

rconn said:
The question I have for you is -- why are you using 65001? Do you think it will provide some benefit?

Well, yes, it does. It's more-or-less required if you're working Python on the console.

rconn said:
I keep telling people that Windows does not actually support UTF-8

Well, it seems to be working in PowerShell/CMD.

rconn · Sep 6, 2016

gschizas said:
Well, it seems to be working in PowerShell/CMD.

CMD definitely does *not* support UTF8 (or, for that matter, Unicode).

Search

Welcome!

Fixed Using codepage 65001 (UTF-8) breaks non-ASCII characters

gschizas

mfarah

Attachments

gschizas

rconn

Administrator

gschizas

rconn

Administrator

nikbackm

gschizas

rconn

Administrator

Similar threads