1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

chcp 65001

Discussion in 'Support' started by nickles, Sep 10, 2011.

  1. nickles

    Joined:
    Jun 24, 2008
    Messages:
    220
    Likes Received:
    0
    cmd.exe

    tcc.exe (latest version)

    Please note the additional spaces ("_") on *inuput* for tcc.exe. In both examples I entered "ääää", w/o any spaces. "option //unicodeoutput={yes|no}" makes no difference whatsoever. Retrieving the last input line in tcc.exe with ARUP gives me:

    Please comment on this (I consider this a bug).

    nickles
     
  2. rconn

    rconn Administrator
    Staff Member

    Joined:
    May 14, 2008
    Messages:
    10,092
    Likes Received:
    85
    TCC does not support UTF-8 I/O (and neither does CMD.EXE).
     
  3. nickles

    Joined:
    Jun 24, 2008
    Messages:
    220
    Likes Received:
    0
    It's not about output, it's about *input*. And, obviously cmd.exe *does* support it (please look at the samples!).

    nickles
     
  4. rconn

    rconn Administrator
    Staff Member

    Joined:
    May 14, 2008
    Messages:
    10,092
    Likes Received:
    85
    But if you try to actually *run* anything in CMD with UTF-8 set (like batch
    files), you'll see that it doesn't support it. (Google "chcp 65001" for
    more info on CMD UTF-8 follies.)

    TCC supports ASCII and UTF-16 I/O. (CMD only supports ASCII.) If you want
    to request UTF-8 support as well (doable, but it'll be reeaallll slooooowwww
    because TCC will have to convert every character to/from UTF-16 so that
    Windows can understand it) you can request it in the Feedback forum. If you
    can convince a few dozen other people to vote for it, there's a good chance
    it will be implemented in a future version.
     
  5. nickles

    Joined:
    Jun 24, 2008
    Messages:
    220
    Likes Received:
    0
    Thanks for the answer Rex,

    however:

    1) Codepage "65001" is actually defined as being "UTF-8", not "UTF-16" (whatever Microsoft thinks about it).

    2) The fact that the behavior in TCC is different from the behavior of CMD tells me that you "are in between" the API and the console window, so you can influence things for the better (this is not CMD...).

    3) I don't think that UTF-8 support would make things slow at all (especially as you would already be able to tell you're in UTF-8 mode after a "chcp 65001"); there are various editors around (e.g. gvim, Notepad++) which have no problems to handle *big* UTF-8 files (and thus conversions between the formats).

    nickles
     
  6. Charles Dye

    Charles Dye Super Moderator
    Staff Member

    Joined:
    May 20, 2008
    Messages:
    3,374
    Likes Received:
    40
  7. nickles

    Joined:
    Jun 24, 2008
    Messages:
    220
    Likes Received:
    0
    @Charles

    I'd like to be in codepage 65001 mode all the time. E.g. I'd be able to cat/type UTF-8 files w/o seeing "garbage"; I'd like to use certain Perl programs (using Win32::Console colored output) to produce readable output, etc.

    On the input side I'd like TCC not to produce "garbage" (i.e. "spaced" letters) when typing e.g. Russian or German letters (cmp. my first post), something which CMD (at least) does.

    So far I use codepage 1252, which - again at least - lets me do the latter.

    nickles
     
  8. rconn

    rconn Administrator
    Staff Member

    Joined:
    May 14, 2008
    Messages:
    10,092
    Likes Received:
    85
    Why don't you use a Unicode (UTF-16) font? Then you can typing whatever
    letters you want in TCC.
     
  9. rconn

    rconn Administrator
    Staff Member

    Joined:
    May 14, 2008
    Messages:
    10,092
    Likes Received:
    85
    Code pages are completely meaningless in Take Command (or any other GUI
    app), and only (very) marginally relevant anymore for console apps.

    UTF-16 and Unicode fonts have rendered them obsolete.
     
  10. nickles

    Joined:
    Jun 24, 2008
    Messages:
    220
    Likes Received:
    0
    @Rex

    Please try the following:

    The output is:

    It's not that that it don't see the correct characters (like 'ä' or 'л' for that matter), it's the spaces that are produced on input (which CMD does not produce).

    And, I contradict: "type"ing a UTF-8 file in 65001 renders different output as "type"ing it in 1252.

    nickles
     
  11. rconn

    rconn Administrator
    Staff Member

    Joined:
    May 14, 2008
    Messages:
    10,092
    Likes Received:
    85
    This is not a bug -- TCC does not support UTF-8 input, in any way, at all.

    If you want to request a feature, suggest it on the Feedback forum.
     
  12. nickles

    Joined:
    Jun 24, 2008
    Messages:
    220
    Likes Received:
    0
    @Rex

    thanks for the clarification.

    nickles
     

Share This Page