1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

International characters - TYPE vs %@LINE function

Discussion in 'Support' started by Stein Oiestad, Nov 17, 2014.

  1. Stein Oiestad

    Joined:
    Sep 11, 2012
    Messages:
    57
    Likes Received:
    1
    When I use TYPE test.txt I get a different result compared to ECHO %@LINE[test.txt...]

    I have attached a small text file for testing, the hex codes are as follows:

    E6 F8 E5 C6 D8 C5
    91 9B 86 92 9D 8F

    Characters:
    æøåÆØÅ (ANSI
    æøåÆØÅ (Old 8-bit ASCII characters representing the same).

    What is the reason for that ??
    Is there an easy way to convert between them without using character by character replacement ??

    Thanks for any help!

    -stein

    Reference: http://en.wikipedia.org/wiki/ISO/IEC_8859-1
     

    Attached Files:

  2. rconn

    rconn Administrator
    Staff Member

    Joined:
    May 14, 2008
    Messages:
    10,022
    Likes Received:
    84
    There are two reasons:

    1) You shouldn't be using ASCII files for non-ASCII characters; if you used a Unicode file instead you wouldn't be having any problems.

    2) The differences are because in the case of @LINE, the RTL is using CP_THREAD_ACP for the conversion, and in the case of TYPE it's using the current console code page.

    And you're going to see different results on different systems -- for example, on my system I see something different in both cases than what you see, due to differences in the code page (in both TCMD and TCC, if you're doing it in a tab window), and differences in the font (particularly if you're using a raster font instead of a Unicode font).

    Did you want the output as converted by @LINE, or the output as converted by TYPE (or neither)?
     
  3. Christian Albaret

    Joined:
    Jul 1, 2008
    Messages:
    156
    Likes Received:
    1
    As you point out there are 2 encodings that come into play.
    - The console (and TCC) work with an OEM encoding (OEM850 for Western Europe).
    - Graphical UIs (and TCMD) work with an ISO encoding (ISO-Latin-1, a-k-a ISO-8859-1, and actually Windows-1252).

    This is a constant annoyance in the non-english world, and I didn't find an way to get through all situations.
    - As a programm writer, on Windows, I have to choose whether the output will be read through a console (and use an OEM encoding) or through a GUI (output redirected and viewed through a text editor) (and use an ISO encoding).
    - TCC assumes the output is in OEM encoding, TCMD converts it to ISO/Windows encoding.

    Your test-file has its 1st line in ISO/Windows encoding, its 2nd line in OEM encoding. Therefore TYPE displays the 2nd line correctly. ECHO %@LINE displays the 1st line correctly; I guess this is because %@LINE does not convert it, and it is converted to UTF16 before it is send to the command-line/parser (in order to execute ECHO).

    I have a few aliases to help making output readable through piping:
    - alias utf8toansi tpipe /unicode=utf-8,ansi (I am currently working on a project where I have to write files in UTF-8 enconding; I pipe FFIND's output through this alias)
    - alias oemtoansi tpipe /simple=4
    - alias ansitooem tpipe /simple=3
    I have configured my text editor so that it performs an oemtoansi conversion when I pipe into it.
    Rex Conn started a "String conversion functions/features" that should help convert strings.
     
  4. Stein Oiestad

    Joined:
    Sep 11, 2012
    Messages:
    57
    Likes Received:
    1
    Thank you both for comprehensive answers, now I think I understand at least what I need for my own tasks.

    I got aware of the problem because I used dir > file.txt and then used for %f in (@file.txt) do .....

    Then the international character "ø" got converted to ">" (redirection symbol) and since I had not escaped it either, for sure it did not work as expected.

    Thanks for the 'tpipe' tips - I did not even think about TPIPE as a possible solution this time, but a year ago I disovered EBCDIC conversion which did what we needed and alternatives would cost about the same as TCMD 10 user :)

    -stein
     
  5. Charles Dye

    Charles Dye Super Moderator
    Staff Member

    Joined:
    May 20, 2008
    Messages:
    3,352
    Likes Received:
    39
    I think OPTION //UNICODEOUTPUT=YES would save you a lot of trouble.
     
  6. Stein Oiestad

    Joined:
    Sep 11, 2012
    Messages:
    57
    Likes Received:
    1
    Thanks to Charles also - guess it could even give me support for Runes - http://en.wikipedia.org/wiki/Runes

    Guess I should print TCMD.CHM and read all pages - things were easier in the DOS days :)
     

Share This Page