chcp 65001

#1
cmd.exe

> chcp 65001
Active code page: 65001

> ääää
'ääää' is not recognized as an internal or external command,
operable program or batch file.
tcc.exe (latest version)

> chcp 65001
Active code page: 65001

> ä_ä_ä_ä
TCC: Unknown command "ääää"
Please note the additional spaces ("_") on *inuput* for tcc.exe. In both examples I entered "ääää", w/o any spaces. "option //unicodeoutput={yes|no}" makes no difference whatsoever. Retrieving the last input line in tcc.exe with ARUP gives me:

ääää____
Please comment on this (I consider this a bug).

nickles
 

rconn

Administrator
Staff member
May 14, 2008
10,632
97
#4
> It's not about output, it's about *input*. And, obviously cmd.exe *does*
> support it (please look at the samples!).
But if you try to actually *run* anything in CMD with UTF-8 set (like batch
files), you'll see that it doesn't support it. (Google "chcp 65001" for
more info on CMD UTF-8 follies.)

TCC supports ASCII and UTF-16 I/O. (CMD only supports ASCII.) If you want
to request UTF-8 support as well (doable, but it'll be reeaallll slooooowwww
because TCC will have to convert every character to/from UTF-16 so that
Windows can understand it) you can request it in the Feedback forum. If you
can convince a few dozen other people to vote for it, there's a good chance
it will be implemented in a future version.
 
#5
Thanks for the answer Rex,

however:

1) Codepage "65001" is actually defined as being "UTF-8", not "UTF-16" (whatever Microsoft thinks about it).

2) The fact that the behavior in TCC is different from the behavior of CMD tells me that you "are in between" the API and the console window, so you can influence things for the better (this is not CMD...).

3) I don't think that UTF-8 support would make things slow at all (especially as you would already be able to tell you're in UTF-8 mode after a "chcp 65001"); there are various editors around (e.g. gvim, Notepad++) which have no problems to handle *big* UTF-8 files (and thus conversions between the formats).

nickles
 

Charles Dye

Super Moderator
Staff member
May 20, 2008
3,667
46
Albuquerque, NM
prospero.unm.edu
#6
#7
@Charles

I'd like to be in codepage 65001 mode all the time. E.g. I'd be able to cat/type UTF-8 files w/o seeing "garbage"; I'd like to use certain Perl programs (using Win32::Console colored output) to produce readable output, etc.

On the input side I'd like TCC not to produce "garbage" (i.e. "spaced" letters) when typing e.g. Russian or German letters (cmp. my first post), something which CMD (at least) does.

So far I use codepage 1252, which - again at least - lets me do the latter.

nickles
 

rconn

Administrator
Staff member
May 14, 2008
10,632
97
#9
> Rex: Microsoft's documentation for CHCP used to include a list of
> supported code pages
> (http://www.microsoft.com/resources/documentation/windows/xp/all
> /pro/ddocs/en-us/chcp.mspx?mfr=true); perhaps Take Command's help
> should also include such a list?
Code pages are completely meaningless in Take Command (or any other GUI
app), and only (very) marginally relevant anymore for console apps.

UTF-16 and Unicode fonts have rendered them obsolete.
 
#10
@Rex

Please try the following:

chcp 65001
ääää
The output is:

ä_ä_ä_ä_
It's not that that it don't see the correct characters (like 'ä' or 'л' for that matter), it's the spaces that are produced on input (which CMD does not produce).

And, I contradict: "type"ing a UTF-8 file in 65001 renders different output as "type"ing it in 1252.

nickles