1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Danish character redirection problem

Discussion in 'TCC/LE Support' started by rmortensen, Sep 30, 2009.

  1. rmortensen

    Joined:
    Sep 30, 2009
    Messages:
    5
    Likes Received:
    0
    With TCC/LE redirection of eg. danish characters like "æ" "ø" and "å" does not work. If I e.g. make a DIR > TMP.TXT of a directory with files having those characters in there names, then I get some weird other characters in the TMP.TXT file. On the screen with a DIR command everything works just fine.

    How to fix this?
     
  2. Steve Fabian

    Joined:
    May 20, 2008
    Messages:
    3,523
    Likes Received:
    4
    rmortensen wrote:
    | With TCC/LE redirection of eg. danish characters like "æ" "ø" and
    | "å" does not work. If I e.g. make a DIR > TMP.TXT of a directory
    | with files having those characters in there names, then I get some
    | weird other characters in the TMP.TXT file. On the screen with a DIR
    | command everything works just fine.
    |
    | How to fix this?

    For internal operations MS Windows and TCC/LE use unicode character
    encoding. For redirected command output the default character encoding is
    your current codepage, which includes ASCII for codes 0...127, and a
    font-dependent set of characters for codes 128...255. This requires mapping
    16-bit Unicode characters to 8-bit characters, resulting in information
    loss.

    According to my reading of the TCC vs. TCC/LE comparison in the help, TCC/LE
    supports unicode output. Accordingly, try starting TCC/LE with the /U
    option, forcing unicode output. Alternately, you may switch between 8-bit
    (normally, but incorrectly, referred to as ASCII) and unicode output using
    these commands:

    OPTION //UnicodeOutput=yes
    OPTION //UnicodeOutput=no

    I have aliases for them (but I use only TCC):

    alias uni=OPTION //UnicodeOutput=yes
    alias nouni=OPTION //UnicodeOutput=no

    The disadvantage of always using unicode is that file sizes are doubled.
    That's the penalty for having a language whose written form represent its
    spoken form more closely than English (which I often refer to as a
    ideogrammatic language, with each ideogram composed of the 26 letters. Can
    you do otherwise when "red" and "read" can sound the same though they look
    different, but "read" (present tense) and "read" (past tense) sound
    different though they do look the same!
    --
    HTH, Steve
     
  3. rconn

    rconn Administrator
    Staff Member

    Joined:
    May 14, 2008
    Messages:
    9,729
    Likes Received:
    80
    Use a Unicode font (like Lucida Console).

    Rex Conn
    JP Software
     
  4. rmortensen

    Joined:
    Sep 30, 2009
    Messages:
    5
    Likes Received:
    0
    I tried the above, and that did the trick! Thanks! Although some of my other apps do not accept unicoded text files :-(

    Chaning to the Unicode font (like Lucida Console) did not seem to make any difference with respect to redirection.

    Still: If I in my btm file eg do "Echo æøå" it comes out wrong on the screen?
     
  5. Steve Fabian

    Joined:
    May 20, 2008
    Messages:
    3,523
    Likes Received:
    4
    rmortensen wrote:
    | Chaning to the Unicode font (like Lucida Console) did not seem to
    | make any difference with respect to redirection.
    |
    | Still: If I in my btm file eg do "Echo æøå" it comes out wrong on
    | the screen?

    1/ The UnicodeOutput directive only affects what and how you write to a
    file, not the screen display.

    2/ The unicode to 8b conversion is "lossy", you have 65536 character codes
    you map into a character set with 256 element, and is not reversible. The 8b
    to 16b conversion is always reversible (as fas as character codes are
    concerned).

    3/ Each character sent from TCC "to the screen" (e.g., your ECHO example) is
    translated into a bitmap of the character that is actually displayed. This
    is a multistage process, involving the program where the character
    originates (TCCLE in this case), NTVDM, the display driver, and I don't know
    what else. Hopefully someone knowledgable will speak up. Whether the program
    sends 16b or 8b codes, the codepage currently and font in effect all affect
    what you see. When TCC runs in its own window the translation is different
    from what it is when TCC runs in a TCMD tab. BTW, I don't think there is any
    difference btw. the full and the LE versions in this aspect.

    Experiment, until you find a combination of codepage and font you like. IIRC
    "Andale mono" had been recommended in the past.
    --
    HTH, Steve
     
  6. rmortensen

    Joined:
    Sep 30, 2009
    Messages:
    5
    Likes Received:
    0
    Thanks! I'll work it out!
     

Share This Page