chcp 65001

Jun 24, 2008
223
0
Siegen, Germany
cmd.exe

> chcp 65001
Active code page: 65001

> ääää
'ääää' is not recognized as an internal or external command,
operable program or batch file.
tcc.exe (latest version)

> chcp 65001
Active code page: 65001

> ä_ä_ä_ä
TCC: Unknown command "ääää"
Please note the additional spaces ("_") on *inuput* for tcc.exe. In both examples I entered "ääää", w/o any spaces. "option //unicodeoutput={yes|no}" makes no difference whatsoever. Retrieving the last input line in tcc.exe with ARUP gives me:

ääää____
Please comment on this (I consider this a bug).

nickles
 

rconn

Administrator
Staff member
May 14, 2008
12,557
167
> It's not about output, it's about *input*. And, obviously cmd.exe *does*
> support it (please look at the samples!).

But if you try to actually *run* anything in CMD with UTF-8 set (like batch
files), you'll see that it doesn't support it. (Google "chcp 65001" for
more info on CMD UTF-8 follies.)

TCC supports ASCII and UTF-16 I/O. (CMD only supports ASCII.) If you want
to request UTF-8 support as well (doable, but it'll be reeaallll slooooowwww
because TCC will have to convert every character to/from UTF-16 so that
Windows can understand it) you can request it in the Feedback forum. If you
can convince a few dozen other people to vote for it, there's a good chance
it will be implemented in a future version.
 
Jun 24, 2008
223
0
Siegen, Germany
Thanks for the answer Rex,

however:

1) Codepage "65001" is actually defined as being "UTF-8", not "UTF-16" (whatever Microsoft thinks about it).

2) The fact that the behavior in TCC is different from the behavior of CMD tells me that you "are in between" the API and the console window, so you can influence things for the better (this is not CMD...).

3) I don't think that UTF-8 support would make things slow at all (especially as you would already be able to tell you're in UTF-8 mode after a "chcp 65001"); there are various editors around (e.g. gvim, Notepad++) which have no problems to handle *big* UTF-8 files (and thus conversions between the formats).

nickles
 

Charles Dye

Super Moderator
Staff member
May 20, 2008
4,689
106
Albuquerque, NM
prospero.unm.edu
Jun 24, 2008
223
0
Siegen, Germany
@Charles

I'd like to be in codepage 65001 mode all the time. E.g. I'd be able to cat/type UTF-8 files w/o seeing "garbage"; I'd like to use certain Perl programs (using Win32::Console colored output) to produce readable output, etc.

On the input side I'd like TCC not to produce "garbage" (i.e. "spaced" letters) when typing e.g. Russian or German letters (cmp. my first post), something which CMD (at least) does.

So far I use codepage 1252, which - again at least - lets me do the latter.

nickles
 

rconn

Administrator
Staff member
May 14, 2008
12,557
167
> Rex: Microsoft's documentation for CHCP used to include a list of
> supported code pages
> (http://www.microsoft.com/resources/documentation/windows/xp/all
> /pro/ddocs/en-us/chcp.mspx?mfr=true); perhaps Take Command's help
> should also include such a list?

Code pages are completely meaningless in Take Command (or any other GUI
app), and only (very) marginally relevant anymore for console apps.

UTF-16 and Unicode fonts have rendered them obsolete.
 
Jun 24, 2008
223
0
Siegen, Germany
@Rex

Please try the following:

chcp 65001
ääää

The output is:

ä_ä_ä_ä_

It's not that that it don't see the correct characters (like 'ä' or 'л' for that matter), it's the spaces that are produced on input (which CMD does not produce).

And, I contradict: "type"ing a UTF-8 file in 65001 renders different output as "type"ing it in 1252.

nickles