Unicode question

  • This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn more.
#1
If the user does
Code:
ECHO %@CHAR[27]
then a left-pointing arrow appears on the screen. Actually in the console screen buffer is the Unicode character 8592 (0x2190). If a plugin internal variable (_CURCHAR) returns that character like this
Code:
ReadConsoleOutput(STD_OUT, &ci, cdOne, cdZero, &sr);
Sprintf(pszSrgs, L"%c", ci.Char.UnicodeChar);
return 0;
then the only tests for it I can find are
Code:
IF %@CHAR[%_CURCHAR] == 8592
IF %@UNICODE[%_CURCHAR] == 8592
Is there any way to test for that character using the more familiar number 27? If, internally, I try WideCharToMultiByte(CP_ACP) on it, I wind up with 63, i.e., the question mark (default unprintable, I suppose).

Thanks!
 
#2
If the user does
Code:
ECHO %@CHAR[27]
then a left-pointing arrow appears on the screen. Actually in the console screen buffer is the Unicode character 8592 (0x2190). If a plugin internal variable (_CURCHAR) returns that character like this
Code:
ReadConsoleOutput(STD_OUT, &ci, cdOne, cdZero, &sr);
Sprintf(pszSrgs, L"%c", ci.Char.UnicodeChar);
return 0;
then the only tests for it I can find are
Code:
IF %@CHAR[%_CURCHAR] == 8592
IF %@UNICODE[%_CURCHAR] == 8592
Is there any way to test for that character using the more familiar number 27? If, internally, I try WideCharToMultiByte(CP_ACP) on it, I wind up with 63, i.e., the question mark (default unprintable, I suppose).
That last sode was in error; it should have read:
Code:
IF %@ASCII[%_CURCHAR] == 8592
IF %@UNICODE[%_CURCHAR] == 8592
 

rconn

Administrator
Staff member
May 14, 2008
10,096
85
#3
vefatica wrote:

> If the user does
>
> Code:
> ---------
> ECHO %@CHAR[27]
> ---------
> then a left-pointing arrow appears on the screen. Actually in the console screen buffer is the Unicode character 8592 (0x2190). If a plugin internal variable (_CURCHAR) returns that character like this
>
> Code:
> ---------
> ReadConsoleOutput(STD_OUT, &ci, cdOne, cdZero, &sr);
> Sprintf(pszSrgs, L"%c", ci.Char.UnicodeChar);
> return 0;
> ---------
> then the only tests for it I can find are
>
> Code:
> ---------
> IF %@CHAR[%_CURCHAR] == 8592
> IF %@UNICODE[%_CURCHAR] == 8592
> ---------
> Is there any way to test for that character using the more familiar number 27? If, internally, I try WideCharToMultiByte(CP_ACP) on it, I wind up with 63, i.e., the question mark (default unprintable, I suppose).
This is a Windows / console manager issue, not TCC. I don't know the
answer; try Microsoft.

Rex Conn
JP Software
 
#4
This is a Windows / console manager issue, not TCC. I don't know the
answer; try Microsoft.
Refering to an <Esc> glyph in the console screen buffer ...

If I use ReadConsoleOutputW() on that character, I get

CHAR_INFO::Char.UnicodeChar = 8592 (the right glyph)
CHAR_INFO::Char.AsciiChar = 65424 [garbage?]

If I use ReadConsoleOutputCharacterW(), I get 8592.

If I use ReadConsoleOutputCharacterA(), I get 27.

I didn't try ReadConsoleOutputA().

Any thoughts Rex?
 
#5
On Mon, 20 Apr 2009 21:01:59 -0500, vefatica <> wrote:

|Refering to an <Esc> glyph in the console screen buffer ...
|
|If I use ReadConsoleOutputW() on that character, I get
|
| CHAR_INFO::Char.UnicodeChar = 8592 (the right glyph)
| CHAR_INFO::Char.AsciiChar = 65424 [garbage?]
|
|If I use ReadConsoleOutputCharacterW(), I get 8592.
|
|If I use ReadConsoleOutputCharacterA(), I get 27.
|
|I didn't try ReadConsoleOutputA().

My guess is that the console screen buffer **is** ASCII (since you're stuck with
some code page) and the ReadConsoleOutput[Character]W() functions translate into
an appropriate Unicode glyph. Make sense? What do you think Rex?
--
- Vince
 

rconn

Administrator
Staff member
May 14, 2008
10,096
85
#6
vefatica wrote:

> ---Quote (Originally by rconn)---
> This is a Windows / console manager issue, not TCC. I don't know the
> answer; try Microsoft.
> ---End Quote---
> Refering to an <Esc> glyph in the console screen buffer ...
>
> If I use ReadConsoleOutputW() on that character, I get
>
> CHAR_INFO::Char.UnicodeChar = 8592 (the right glyph)
> CHAR_INFO::Char.AsciiChar = 65424 [garbage?]
>
> If I use ReadConsoleOutputCharacterW(), I get 8592.
>
> If I use ReadConsoleOutputCharacterA(), I get 27.
>
> I didn't try ReadConsoleOutputA().
>
> Any thoughts Rex?
That's what I would expect. What's your question?

Rex Conn
JP Software
 

rconn

Administrator
Staff member
May 14, 2008
10,096
85
#7
vefatica wrote:

> My guess is that the console screen buffer **is** ASCII (since you're stuck with
> some code page) and the ReadConsoleOutput[Character]W() functions translate into
> an appropriate Unicode glyph. Make sense? What do you think Rex?
Other way around -- the console buffer is Unicode and the translations
are into ASCII. (Which results occasionally in some odd conversions.)

All of XP/Vista/etc. is Unicode internally.

Rex Conn
JP Software
 
#8
On Mon, 20 Apr 2009 21:42:38 -0500, rconn <> wrote:

|vefatica wrote:
|
|
|---Quote---
|> ---Quote (Originally by rconn)---
|> This is a Windows / console manager issue, not TCC. I don't know the
|> answer; try Microsoft.
|> ---End Quote---
|> Refering to an <Esc> glyph in the console screen buffer ...
|>
|> If I use ReadConsoleOutputW() on that character, I get
|>
|> CHAR_INFO::Char.UnicodeChar = 8592 (the right glyph)
|> CHAR_INFO::Char.AsciiChar = 65424 [garbage?]
|>
|> If I use ReadConsoleOutputCharacterW(), I get 8592.
|>
|> If I use ReadConsoleOutputCharacterA(), I get 27.
|>
|> I didn't try ReadConsoleOutputA().
|>
|> Any thoughts Rex?
|---End Quote---
|That's what I would expect. What's your question?

I guess it's this: Is the console screen buffer both ASCII and Unicode, keeping
a record of both?
--
- Vince
 
#10
A seemingly knowledgeable gent replied to my newsgroup query thus (below). It's
beyond me. Does it make sense to you. I can accurately get the character under
the mouse cursor and reproduce it. As for turning it into a **familiar** number
(some character code) I think I'm SOL.

Quoting:

You have dipped into subject that mixes ancient history and modern
internationalization.

The original IBM CGA display included fonts in its ROM that had glyphs in
all 256 places, including the control characters and the high 128
characters. The glyph for 0x1B was a left-facing arrow.

Today, this character set lives on as the default 8-bit code page for
command shells, CP437. The console buffer (essentially a virtualization of
the CGA text-mode buffer at 0B8000) is an 8-bit buffer, so the value that
is written is the 8-bit value 0x27.

When you use ReadConsoleOutputW, the system does an ANSI-to-Unicode
conversion for you, using the CP437 code page. Since 0x27 in CP437 is
left-pointing-arrow, you read 0x2190.

-It's interesting. If I use ReadConsoleOutputW() on that character, I get
-
- CHAR_INFO::Char.UnicodeChar = 8592
- CHAR_INFO::Char.AsciiChar = 65424 [garbage?]

This would have made more sense if you had looked at this in hex.

8592 = 0x2190
65424 = 0xff90

This is just taking the low-order byte of the Unicode character you got,
and sign-extending it.

-If I use ReadConsoleOutputCharacterW(), I get 8592.
-If I use ReadConsoleOutputCharacterA(), I get 27.
-If I use ReadConsoleOutputA(), I get 27.
-
-So the "A" version of the functions is doing some translating (or the "W"
-version is). WideCharToMultiByte() always failed to translate correctly for
-returning 8592 into 63 ("?", the default un-printable). I wish I understood
-what's going on.

When YOU call WideCharToMultiByte, you are using some other 8-bit code
page, and that code page does not have an encoding for "left-pointing
arrow". If you called WideCharToMultiByte with CP437, you would get 0x27.
--
- Vince
 
#11
vefatica wrote:
| A seemingly knowledgeable gent replied to my newsgroup query thus
| (below). It's beyond me. Does it make sense to you. I can
| accurately get the character under the mouse cursor and reproduce it.
| As for turning it into a **familiar** number (some character code) I
| think I'm SOL.
|
| Quoting:
|
| You have dipped into subject that mixes ancient history and modern
| internationalization.
|
| The original IBM CGA display included fonts in its ROM that had
| glyphs in
| all 256 places, including the control characters and the high 128
| characters. The glyph for 0x1B was a left-facing arrow.
|
| Today, this character set lives on as the default 8-bit code page for
| command shells, CP437. The console buffer (essentially a
| virtualization of
| the CGA text-mode buffer at 0B8000) is an 8-bit buffer, so the value
| that
| is written is the 8-bit value 0x27.
|
| When you use ReadConsoleOutputW, the system does an ANSI-to-Unicode
| conversion for you, using the CP437 code page. Since 0x27 in CP437 is
| left-pointing-arrow, you read 0x2190.
|
| -It's interesting. If I use ReadConsoleOutputW() on that character,
| I get -
| - CHAR_INFO::Char.UnicodeChar = 8592
| - CHAR_INFO::Char.AsciiChar = 65424 [garbage?]
|
| This would have made more sense if you had looked at this in hex.
|
| 8592 = 0x2190
| 65424 = 0xff90
|
| This is just taking the low-order byte of the Unicode character you
| got,
| and sign-extending it.
|
| -If I use ReadConsoleOutputCharacterW(), I get 8592.
| -If I use ReadConsoleOutputCharacterA(), I get 27.
| -If I use ReadConsoleOutputA(), I get 27.
| -
| -So the "A" version of the functions is doing some translating (or
| the "W"
| -version is). WideCharToMultiByte() always failed to translate
| correctly for
| -returning 8592 into 63 ("?", the default un-printable). I wish I
| understood
| -what's going on.
|
| When YOU call WideCharToMultiByte, you are using some other 8-bit code
| page, and that code page does not have an encoding for "left-pointing
| arrow". If you called WideCharToMultiByte with CP437, you would get
| 0x27.

Seems that the W mode performs glyph-based code translation. The glyph for
any octet that is not a printable ASCII character (0x00-0x1F, 0x7F-0xFF)
depends on the codepage, and is thus translated. Printable characters within
the ASCII range (0x20-0x7E) the W codes should be OK. The A mode seems to be
OK - it does not translate, returns the actual octets.

PS: Nearly 20 years ago I had used the CP437 non-printable codes to display
on the screen the outline drawing of an add-on PC card, showing its proper
jumper settings for the BIOS and other add-in cards in use (to select memory
mapping and port selection).
--
Steve
 
#12
On Tue, 21 Apr 2009 23:27:56 -0500, Steve Fábián <> wrote:

|Seems that the W mode performs glyph-based code translation. The glyph for
|any octet that is not a printable ASCII character (0x00-0x1F, 0x7F-0xFF)
|depends on the codepage, and is thus translated. Printable characters within
|the ASCII range (0x20-0x7E) the W codes should be OK. The A mode seems to be
|OK - it does not translate, returns the actual octets.
|
|PS: Nearly 20 years ago I had used the CP437 non-printable codes to display
|on the screen the outline drawing of an add-on PC card, showing its proper
|jumper settings for the BIOS and other add-in cards in use (to select memory
|mapping and port selection).

It's all unintelligible to me. Can someone explain this (I'm not complaining).

chcp
Active code page: 437

echo %@ascii[%@char[240]]
240

Fine!

But if I "echo %@char[240]" then copy/paste the result into %@ascii[], I get

echo %@ascii[d]
100

What's going on?
--
- Vince
 
#13
vefatica wrote:
| It's all unintelligible to me. Can someone explain this (I'm not
| complaining).
|
| chcp
| Active code page: 437
|
| echo %@ascii[%@char[240]]
| 240
|
| Fine!
|
| But if I "echo %@char[240]" then copy/paste the result into
| %@ascii[], I get
|
| echo %@ascii[d]
| 100

In standalone TCC 10.00.67 in Windows XP (SP3), with UnicodeOutput=No, I
had the same result (100) from the command
echo %@ascii[%@execstr[echo %@char[240]]]
as your last point.

When I switched to UnicodeOutput=Yes the command displayed 240! I was
amazed...

The problem is that when UnicodeOutput=No, the display is in ASCII with
CP437 extensions (Ax437 below), so TCC sends its output (including that
which is processed via @EXECSTR without actual display) through a
many-to-few mapping (translation) of Unicode to Ax437, and an Ax437 to
Unicode (one-to-one) mapping before it is used in @ASCII. When
UnicodeOutput=Yes, no mapping is done, thus the output both ways is the same
(lowercase d).

BTW, my myriad of X3.64 color-changing escape sequences work perfectly well
when UnicodeOutput=Yes...
--
Steve
 
#14
Steve Fabian wrote:
| vefatica wrote:
|| It's all unintelligible to me. Can someone explain this (I'm not
|| complaining).
||
|| chcp
|| Active code page: 437
||
|| echo %@ascii[%@char[240]]
|| 240
||
|| Fine!
||
|| But if I "echo %@char[240]" then copy/paste the result into
|| %@ascii[], I get
||
|| echo %@ascii[d]
|| 100
|
| In standalone TCC 10.00.67 in Windows XP (SP3), with
| UnicodeOutput=No, I had the same result (100) from the command
| echo %@ascii[%@execstr[echo %@char[240]]]
| as your last point.
|
| When I switched to UnicodeOutput=Yes the command displayed 240! I was
| amazed...
|
| The problem is that when UnicodeOutput=No, the display is in ASCII
| with CP437 extensions (Ax437 below), so TCC sends its output
| (including that which is processed via @EXECSTR without actual
| display) through a many-to-few mapping (translation) of Unicode to
| Ax437, and an Ax437 to Unicode (one-to-one) mapping before it is used
| in @ASCII. When UnicodeOutput=Yes, no mapping is done, thus the
| output both ways is the same (lowercase d).
|
| BTW, my myriad of X3.64 color-changing escape sequences work
| perfectly well when UnicodeOutput=Yes...