Danish character redirection problem

  • This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn more.
Sep 30, 2009
5
0
#1
With TCC/LE redirection of eg. danish characters like "æ" "ø" and "å" does not work. If I e.g. make a DIR > TMP.TXT of a directory with files having those characters in there names, then I get some weird other characters in the TMP.TXT file. On the screen with a DIR command everything works just fine.

How to fix this?
 
#2
rmortensen wrote:
| With TCC/LE redirection of eg. danish characters like "æ" "ø" and
| "å" does not work. If I e.g. make a DIR > TMP.TXT of a directory
| with files having those characters in there names, then I get some
| weird other characters in the TMP.TXT file. On the screen with a DIR
| command everything works just fine.
|
| How to fix this?

For internal operations MS Windows and TCC/LE use unicode character
encoding. For redirected command output the default character encoding is
your current codepage, which includes ASCII for codes 0...127, and a
font-dependent set of characters for codes 128...255. This requires mapping
16-bit Unicode characters to 8-bit characters, resulting in information
loss.

According to my reading of the TCC vs. TCC/LE comparison in the help, TCC/LE
supports unicode output. Accordingly, try starting TCC/LE with the /U
option, forcing unicode output. Alternately, you may switch between 8-bit
(normally, but incorrectly, referred to as ASCII) and unicode output using
these commands:

OPTION //UnicodeOutput=yes
OPTION //UnicodeOutput=no

I have aliases for them (but I use only TCC):

alias uni=OPTION //UnicodeOutput=yes
alias nouni=OPTION //UnicodeOutput=no

The disadvantage of always using unicode is that file sizes are doubled.
That's the penalty for having a language whose written form represent its
spoken form more closely than English (which I often refer to as a
ideogrammatic language, with each ideogram composed of the 26 letters. Can
you do otherwise when "red" and "read" can sound the same though they look
different, but "read" (present tense) and "read" (past tense) sound
different though they do look the same!
--
HTH, Steve
 

rconn

Administrator
Staff member
May 14, 2008
10,103
85
#3
> With TCC/LE redirection of eg. danish characters like "æ" "ø" and "å"
> does not work. If I e.g. make a DIR > TMP.TXT of a directory with files
> having those characters in there names, then I get some weird other
> characters in the TMP.TXT file. On the screen with a DIR command
> everything works just fine.
>
> How to fix this?
Use a Unicode font (like Lucida Console).

Rex Conn
JP Software
 
Sep 30, 2009
5
0
#4
rmortensen wrote:
OPTION //UnicodeOutput=yes
OPTION //UnicodeOutput=no
--
HTH, Steve
I tried the above, and that did the trick! Thanks! Although some of my other apps do not accept unicoded text files :-(

Chaning to the Unicode font (like Lucida Console) did not seem to make any difference with respect to redirection.

Still: If I in my btm file eg do "Echo æøå" it comes out wrong on the screen?
 
#5
rmortensen wrote:
| Chaning to the Unicode font (like Lucida Console) did not seem to
| make any difference with respect to redirection.
|
| Still: If I in my btm file eg do "Echo æøå" it comes out wrong on
| the screen?

1/ The UnicodeOutput directive only affects what and how you write to a
file, not the screen display.

2/ The unicode to 8b conversion is "lossy", you have 65536 character codes
you map into a character set with 256 element, and is not reversible. The 8b
to 16b conversion is always reversible (as fas as character codes are
concerned).

3/ Each character sent from TCC "to the screen" (e.g., your ECHO example) is
translated into a bitmap of the character that is actually displayed. This
is a multistage process, involving the program where the character
originates (TCCLE in this case), NTVDM, the display driver, and I don't know
what else. Hopefully someone knowledgable will speak up. Whether the program
sends 16b or 8b codes, the codepage currently and font in effect all affect
what you see. When TCC runs in its own window the translation is different
from what it is when TCC runs in a TCMD tab. BTW, I don't think there is any
difference btw. the full and the LE versions in this aspect.

Experiment, until you find a combination of codepage and font you like. IIRC
"Andale mono" had been recommended in the past.
--
HTH, Steve