Welcome!

By registering with us, you'll be able to discuss, share and private message with other members of our community.

SignUp Now!

International characters - TYPE vs %@LINE function

Sep
101
1
When I use TYPE test.txt I get a different result compared to ECHO %@LINE[test.txt...]

I have attached a small text file for testing, the hex codes are as follows:

E6 F8 E5 C6 D8 C5
91 9B 86 92 9D 8F

Characters:
æøåÆØÅ (ANSI
æøåÆØÅ (Old 8-bit ASCII characters representing the same).

What is the reason for that ??
Is there an easy way to convert between them without using character by character replacement ??

Thanks for any help!

-stein

Reference: http://en.wikipedia.org/wiki/ISO/IEC_8859-1
 

Attachments

  • test.txt
    16 bytes · Views: 266
  • test.jpg
    test.jpg
    22.8 KB · Views: 288
There are two reasons:

1) You shouldn't be using ASCII files for non-ASCII characters; if you used a Unicode file instead you wouldn't be having any problems.

2) The differences are because in the case of @LINE, the RTL is using CP_THREAD_ACP for the conversion, and in the case of TYPE it's using the current console code page.

And you're going to see different results on different systems -- for example, on my system I see something different in both cases than what you see, due to differences in the code page (in both TCMD and TCC, if you're doing it in a tab window), and differences in the font (particularly if you're using a raster font instead of a Unicode font).

Did you want the output as converted by @LINE, or the output as converted by TYPE (or neither)?
 
As you point out there are 2 encodings that come into play.
- The console (and TCC) work with an OEM encoding (OEM850 for Western Europe).
- Graphical UIs (and TCMD) work with an ISO encoding (ISO-Latin-1, a-k-a ISO-8859-1, and actually Windows-1252).

This is a constant annoyance in the non-english world, and I didn't find an way to get through all situations.
- As a programm writer, on Windows, I have to choose whether the output will be read through a console (and use an OEM encoding) or through a GUI (output redirected and viewed through a text editor) (and use an ISO encoding).
- TCC assumes the output is in OEM encoding, TCMD converts it to ISO/Windows encoding.

Your test-file has its 1st line in ISO/Windows encoding, its 2nd line in OEM encoding. Therefore TYPE displays the 2nd line correctly. ECHO %@LINE displays the 1st line correctly; I guess this is because %@LINE does not convert it, and it is converted to UTF16 before it is send to the command-line/parser (in order to execute ECHO).

I have a few aliases to help making output readable through piping:
- alias utf8toansi tpipe /unicode=utf-8,ansi (I am currently working on a project where I have to write files in UTF-8 enconding; I pipe FFIND's output through this alias)
- alias oemtoansi tpipe /simple=4
- alias ansitooem tpipe /simple=3
I have configured my text editor so that it performs an oemtoansi conversion when I pipe into it.
Rex Conn started a "String conversion functions/features" that should help convert strings.
 
Thank you both for comprehensive answers, now I think I understand at least what I need for my own tasks.

I got aware of the problem because I used dir > file.txt and then used for %f in (@file.txt) do .....

Then the international character "ø" got converted to ">" (redirection symbol) and since I had not escaped it either, for sure it did not work as expected.

Thanks for the 'tpipe' tips - I did not even think about TPIPE as a possible solution this time, but a year ago I disovered EBCDIC conversion which did what we needed and alternatives would cost about the same as TCMD 10 user :)

-stein
 
Thank you both for comprehensive answers, now I think I understand at least what I need for my own tasks.

I got aware of the problem because I used dir > file.txt and then used for %f in (@file.txt) do .....

Then the international character "ø" got converted to ">" (redirection symbol) and since I had not escaped it either, for sure it did not work as expected.

I think OPTION //UNICODEOUTPUT=YES would save you a lot of trouble.
 

Similar threads

Back
Top