International characters - TYPE vs %@LINE function

Sep 11, 2012
100
1
When I use TYPE test.txt I get a different result compared to ECHO %@LINE[test.txt...]

I have attached a small text file for testing, the hex codes are as follows:

E6 F8 E5 C6 D8 C5
91 9B 86 92 9D 8F

Characters:
æøåÆØÅ (ANSI
æøåÆØÅ (Old 8-bit ASCII characters representing the same).

What is the reason for that ??
Is there an easy way to convert between them without using character by character replacement ??

Thanks for any help!

-stein

Reference: http://en.wikipedia.org/wiki/ISO/IEC_8859-1
 

Attachments

  • test.txt
    16 bytes · Views: 87
  • test.jpg
    test.jpg
    22.8 KB · Views: 84

rconn

Administrator
Staff member
May 14, 2008
12,345
150
There are two reasons:

1) You shouldn't be using ASCII files for non-ASCII characters; if you used a Unicode file instead you wouldn't be having any problems.

2) The differences are because in the case of @LINE, the RTL is using CP_THREAD_ACP for the conversion, and in the case of TYPE it's using the current console code page.

And you're going to see different results on different systems -- for example, on my system I see something different in both cases than what you see, due to differences in the code page (in both TCMD and TCC, if you're doing it in a tab window), and differences in the font (particularly if you're using a raster font instead of a Unicode font).

Did you want the output as converted by @LINE, or the output as converted by TYPE (or neither)?
 
As you point out there are 2 encodings that come into play.
- The console (and TCC) work with an OEM encoding (OEM850 for Western Europe).
- Graphical UIs (and TCMD) work with an ISO encoding (ISO-Latin-1, a-k-a ISO-8859-1, and actually Windows-1252).

This is a constant annoyance in the non-english world, and I didn't find an way to get through all situations.
- As a programm writer, on Windows, I have to choose whether the output will be read through a console (and use an OEM encoding) or through a GUI (output redirected and viewed through a text editor) (and use an ISO encoding).
- TCC assumes the output is in OEM encoding, TCMD converts it to ISO/Windows encoding.

Your test-file has its 1st line in ISO/Windows encoding, its 2nd line in OEM encoding. Therefore TYPE displays the 2nd line correctly. ECHO %@LINE displays the 1st line correctly; I guess this is because %@LINE does not convert it, and it is converted to UTF16 before it is send to the command-line/parser (in order to execute ECHO).

I have a few aliases to help making output readable through piping:
- alias utf8toansi tpipe /unicode=utf-8,ansi (I am currently working on a project where I have to write files in UTF-8 enconding; I pipe FFIND's output through this alias)
- alias oemtoansi tpipe /simple=4
- alias ansitooem tpipe /simple=3
I have configured my text editor so that it performs an oemtoansi conversion when I pipe into it.
Rex Conn started a "String conversion functions/features" that should help convert strings.
 
Sep 11, 2012
100
1
Thank you both for comprehensive answers, now I think I understand at least what I need for my own tasks.

I got aware of the problem because I used dir > file.txt and then used for %f in (@file.txt) do .....

Then the international character "ø" got converted to ">" (redirection symbol) and since I had not escaped it either, for sure it did not work as expected.

Thanks for the 'tpipe' tips - I did not even think about TPIPE as a possible solution this time, but a year ago I disovered EBCDIC conversion which did what we needed and alternatives would cost about the same as TCMD 10 user :)

-stein
 

Charles Dye

Super Moderator
Staff member
May 20, 2008
4,461
88
Albuquerque, NM
prospero.unm.edu
Thank you both for comprehensive answers, now I think I understand at least what I need for my own tasks.

I got aware of the problem because I used dir > file.txt and then used for %f in (@file.txt) do .....

Then the international character "ø" got converted to ">" (redirection symbol) and since I had not escaped it either, for sure it did not work as expected.

I think OPTION //UNICODEOUTPUT=YES would save you a lot of trouble.
 
Similar threads
Thread starter Title Forum Replies Date
R WAD Unusable state when using Chinese characters Support 3
C TEE command appending null characters to output Support 6
D Regexes and escape characters Support 5
vefatica Quoting file names with special characters? Support 7
vefatica How to? Add real <ESC> characters with TPIPE Support 2
C Erase all characters left or right from cursor location Support 2
J Escaped Characters and Variables Support 2
Joe Caverly Unicode, Codepage 437, and line characters Support 3
gschizas Fixed Using codepage 65001 (UTF-8) breaks non-ASCII characters Support 8
vefatica Display of special characters in aliases. Support 25
J Python: TCC command line parsing removes '=' equal sign characters Support 4
C filemasks over 172 characters fail Support 5
redwdc TCC.exe opens to 11 characters x 3 rows Support 1
G Selection characters with mouse double-click Support 3
R Function #IDOW returns only 2 characters Support 10
krischik WAD Tee printing Chinese characters Support 7
Alexander How to? how can I use Cyrillic characters in the Echo command? Support 8
jadaml Echo unicode characters from UTF-8 Batch files? Support 1
V Fixed CD ~ has trash characters in it Support 5
Jay Sage Getting Quote Characters into a Toolbar Tab Support 12
vefatica Fixed Escaping special characters Support 1
M Problem with Extended Search and non-ASCII characters in directories names Support 3
A How to? PhraseExpress autotext/autocompletion printing unexpected characters Support 4
thedave WAD Unable to type various characters using Windows 8's onscreen keyboard Support 7
JohnQSmith WAD DO and escaped redirect characters Support 6
W TCMD 16: double characters Support 11
MickeyF how to work with env var with special characters in the name Support 5
vefatica Fixed TPIPE doesn't /SPLIT at characters Support 0
A Escape characters on right side of pipe Support 0
J How to? 'tab' characters in the console and clipboard 'copy' Support 1
M How to? Determine a whether vars beginning with certain characters exist... Support 5
M Another simple question re. Take Command/TCC window witdth in characters... Support 16
U Keystack mangles 'special' characters Support 6
J dir failure with some unicode characters Support 6
vefatica Random access to the characters in a string? Support 4
nikbackm @select function and TAB characters Support 1
M Overriding meanings of some characters... Support 7
T How do you echo lines with special characters to file? Support 4
Emilio III Control characters pass through? Support 6
S Strange REN problem - non-English characters Support 3
nikbackm Output of non-ascii characters via pipe Support 1
K Processing strings with % characters? Support 2
W Bugs in dealing with Chinese characters? Support 0
Joe Caverly Using TYPE with non-English text Support 22
vefatica TYPE behaving randomly Support 10
vefatica Garbage from TYPE Support 2
vefatica TYPE resets console tab settings Support 14
Joe Caverly Get a variable type Support 2
Joe Caverly v24 TYPE Re-direction Support 4
vefatica TYPE goes crazy with no-BOM Unicode file Support 7

Similar threads