chcp 65001

nickles · Sep 10, 2011

cmd.exe

> chcp 65001
Active code page: 65001

> ääää
'ääää' is not recognized as an internal or external command,
operable program or batch file.

tcc.exe (latest version)

> chcp 65001
Active code page: 65001

> ä_ä_ä_ä
TCC: Unknown command "ääää"

Please note the additional spaces ("_") on *inuput* for tcc.exe. In both examples I entered "ääää", w/o any spaces. "option //unicodeoutput={yes|no}" makes no difference whatsoever. Retrieving the last input line in tcc.exe with ARUP gives me:

ääää____

Please comment on this (I consider this a bug).

nickles

rconn · Sep 10, 2011

> > chcp 65001
> Active code page: 65001

TCC does not support UTF-8 I/O (and neither does CMD.EXE).

nickles · Sep 10, 2011

It's not about output, it's about *input*. And, obviously cmd.exe *does* support it (please look at the samples!).

nickles

rconn · Sep 10, 2011

> It's not about output, it's about *input*. And, obviously cmd.exe *does*
> support it (please look at the samples!).

But if you try to actually *run* anything in CMD with UTF-8 set (like batch
files), you'll see that it doesn't support it. (Google "chcp 65001" for
more info on CMD UTF-8 follies.)

TCC supports ASCII and UTF-16 I/O. (CMD only supports ASCII.) If you want
to request UTF-8 support as well (doable, but it'll be reeaallll slooooowwww
because TCC will have to convert every character to/from UTF-16 so that
Windows can understand it) you can request it in the Feedback forum. If you
can convince a few dozen other people to vote for it, there's a good chance
it will be implemented in a future version.

nickles · Sep 11, 2011

Thanks for the answer Rex,

however:

1) Codepage "65001" is actually defined as being "UTF-8", not "UTF-16" (whatever Microsoft thinks about it).

2) The fact that the behavior in TCC is different from the behavior of CMD tells me that you "are in between" the API and the console window, so you can influence things for the better (this is not CMD...).

3) I don't think that UTF-8 support would make things slow at all (especially as you would already be able to tell you're in UTF-8 mode after a "chcp 65001"); there are various editors around (e.g. gvim, Notepad++) which have no problems to handle *big* UTF-8 files (and thus conversions between the formats).

nickles

Charles Dye · Sep 11, 2011

Rex: Microsoft's documentation for CHCP used to include a list of supported code pages (http://www.microsoft.com/resources/documentation/windows/xp/all/proddocs/en-us/chcp.mspx?mfr=true); perhaps Take Command's help should also include such a list?

Nickles: I don't understand how UTF-8 can make sense in the context of keyboard input. What are you trying to accomplish?

nickles · Sep 11, 2011

@Charles

I'd like to be in codepage 65001 mode all the time. E.g. I'd be able to cat/type UTF-8 files w/o seeing "garbage"; I'd like to use certain Perl programs (using Win32::Console colored output) to produce readable output, etc.

On the input side I'd like TCC not to produce "garbage" (i.e. "spaced" letters) when typing e.g. Russian or German letters (cmp. my first post), something which CMD (at least) does.

So far I use codepage 1252, which - again at least - lets me do the latter.

nickles

rconn · Sep 11, 2011

> On the input side I'd like TCC not to produce "garbage" (i.e. "spaced"
> letters) when typing e.g. Russian or German letters (cmp. my first
> post), something which CMD (at least) does.

Why don't you use a Unicode (UTF-16) font? Then you can typing whatever
letters you want in TCC.

rconn · Sep 11, 2011

> Rex: Microsoft's documentation for CHCP used to include a list of
> supported code pages
> (http://www.microsoft.com/resources/documentation/windows/xp/all
> /pro/ddocs/en-us/chcp.mspx?mfr=true); perhaps Take Command's help
> should also include such a list?

Code pages are completely meaningless in Take Command (or any other GUI
app), and only (very) marginally relevant anymore for console apps.

UTF-16 and Unicode fonts have rendered them obsolete.

nickles · Sep 12, 2011

@Rex

Please try the following:

chcp 65001
ääää

The output is:

ä_ä_ä_ä_

It's not that that it don't see the correct characters (like 'ä' or 'л' for that matter), it's the spaces that are produced on input (which CMD does not produce).

And, I contradict: "type"ing a UTF-8 file in 65001 renders different output as "type"ing it in 1252.

nickles

rconn · Sep 12, 2011

> ---Quote---
> chcp 65001
> Ã¤Ã¤Ã¤Ã¤
> ---End Quote---

This is not a bug -- TCC does not support UTF-8 input, in any way, at all.

If you want to request a feature, suggest it on the Feedback forum.

nickles · Sep 13, 2011

@Rex

thanks for the clarification.

nickles

Search

Welcome!

chcp 65001

nickles

rconn

Administrator

nickles

rconn

Administrator

nickles

Charles Dye

Super Moderator

nickles

rconn

Administrator

rconn

Administrator

nickles

rconn

Administrator

nickles

Similar threads