Unicode question

vefatica · Apr 19, 2009

If the user does

Code:

ECHO %@CHAR[27]

then a left-pointing arrow appears on the screen. Actually in the console screen buffer is the Unicode character 8592 (0x2190). If a plugin internal variable (_CURCHAR) returns that character like this

Code:

ReadConsoleOutput(STD_OUT, &ci, cdOne, cdZero, &sr);
Sprintf(pszSrgs, L"%c", ci.Char.UnicodeChar);
return 0;

then the only tests for it I can find are

Code:

IF %@CHAR[%_CURCHAR] == 8592
IF %@UNICODE[%_CURCHAR] == 8592

Is there any way to test for that character using the more familiar number 27? If, internally, I try WideCharToMultiByte(CP_ACP) on it, I wind up with 63, i.e., the question mark (default unprintable, I suppose).

Thanks!

vefatica · Apr 19, 2009

vefatica said:
If the user does

Code:

ECHO %@CHAR[27]

then a left-pointing arrow appears on the screen. Actually in the console screen buffer is the Unicode character 8592 (0x2190). If a plugin internal variable (_CURCHAR) returns that character like this

Code:

ReadConsoleOutput(STD_OUT, &ci, cdOne, cdZero, &sr); Sprintf(pszSrgs, L"%c", ci.Char.UnicodeChar); return 0;

then the only tests for it I can find are

Code:

IF %@CHAR[%_CURCHAR] == 8592 IF %@UNICODE[%_CURCHAR] == 8592

Is there any way to test for that character using the more familiar number 27? If, internally, I try WideCharToMultiByte(CP_ACP) on it, I wind up with 63, i.e., the question mark (default unprintable, I suppose).

That last sode was in error; it should have read:

Code:

IF %@ASCII[%_CURCHAR] == 8592
IF %@UNICODE[%_CURCHAR] == 8592

rconn · Apr 19, 2009

vefatica wrote:

> If the user does
>
> Code:
> ---------
> ECHO %@CHAR[27]
> ---------
> then a left-pointing arrow appears on the screen. Actually in the console screen buffer is the Unicode character 8592 (0x2190). If a plugin internal variable (_CURCHAR) returns that character like this
>
> Code:
> ---------
> ReadConsoleOutput(STD_OUT, &ci, cdOne, cdZero, &sr);
> Sprintf(pszSrgs, L"%c", ci.Char.UnicodeChar);
> return 0;
> ---------
> then the only tests for it I can find are
>
> Code:
> ---------
> IF %@CHAR[%_CURCHAR] == 8592
> IF %@UNICODE[%_CURCHAR] == 8592
> ---------
> Is there any way to test for that character using the more familiar number 27? If, internally, I try WideCharToMultiByte(CP_ACP) on it, I wind up with 63, i.e., the question mark (default unprintable, I suppose).

This is a Windows / console manager issue, not TCC. I don't know the
answer; try Microsoft.

Rex Conn
JP Software

vefatica · Apr 20, 2009

rconn said:
This is a Windows / console manager issue, not TCC. I don't know the
answer; try Microsoft.

Refering to an <Esc> glyph in the console screen buffer ...

If I use ReadConsoleOutputW() on that character, I get

CHAR_INFO::Char.UnicodeChar = 8592 (the right glyph)
CHAR_INFO::Char.AsciiChar = 65424 [garbage?]

If I use ReadConsoleOutputCharacterW(), I get 8592.

If I use ReadConsoleOutputCharacterA(), I get 27.

I didn't try ReadConsoleOutputA().

Any thoughts Rex?

vefatica · Apr 20, 2009

On Mon, 20 Apr 2009 21:01:59 -0500, vefatica <> wrote:

|Refering to an <Esc> glyph in the console screen buffer ...
|
|If I use ReadConsoleOutputW() on that character, I get
|
| CHAR_INFO::Char.UnicodeChar = 8592 (the right glyph)
| CHAR_INFO::Char.AsciiChar = 65424 [garbage?]
|
|If I use ReadConsoleOutputCharacterW(), I get 8592.
|
|If I use ReadConsoleOutputCharacterA(), I get 27.
|
|I didn't try ReadConsoleOutputA().

My guess is that the console screen buffer **is** ASCII (since you're stuck with
some code page) and the ReadConsoleOutput[Character]W() functions translate into
an appropriate Unicode glyph. Make sense? What do you think Rex?
--
- Vince

rconn · Apr 20, 2009

vefatica wrote:

> ---Quote (Originally by rconn)---
> This is a Windows / console manager issue, not TCC. I don't know the
> answer; try Microsoft.
> ---End Quote---
> Refering to an <Esc> glyph in the console screen buffer ...
>
> If I use ReadConsoleOutputW() on that character, I get
>
> CHAR_INFO::Char.UnicodeChar = 8592 (the right glyph)
> CHAR_INFO::Char.AsciiChar = 65424 [garbage?]
>
> If I use ReadConsoleOutputCharacterW(), I get 8592.
>
> If I use ReadConsoleOutputCharacterA(), I get 27.
>
> I didn't try ReadConsoleOutputA().
>
> Any thoughts Rex?

That's what I would expect. What's your question?

Rex Conn
JP Software

rconn · Apr 21, 2009

vefatica wrote:

> My guess is that the console screen buffer **is** ASCII (since you're stuck with
> some code page) and the ReadConsoleOutput[Character]W() functions translate into
> an appropriate Unicode glyph. Make sense? What do you think Rex?

Other way around -- the console buffer is Unicode and the translations
are into ASCII. (Which results occasionally in some odd conversions.)

All of XP/Vista/etc. is Unicode internally.

Rex Conn
JP Software

vefatica · Apr 21, 2009

rconn · Apr 21, 2009

vefatica wrote:
>

> I guess it's this: Is the console screen buffer both ASCII and Unicode, keeping
> a record of both?

Not to my knowledge. But the only one who'd know for sure is the author
of the console manager.

Rex Conn
JP Software

vefatica · Apr 21, 2009

A seemingly knowledgeable gent replied to my newsgroup query thus (below). It's
beyond me. Does it make sense to you. I can accurately get the character under
the mouse cursor and reproduce it. As for turning it into a **familiar** number
(some character code) I think I'm SOL.

Quoting:

You have dipped into subject that mixes ancient history and modern
internationalization.

The original IBM CGA display included fonts in its ROM that had glyphs in
all 256 places, including the control characters and the high 128
characters. The glyph for 0x1B was a left-facing arrow.

Today, this character set lives on as the default 8-bit code page for
command shells, CP437. The console buffer (essentially a virtualization of
the CGA text-mode buffer at 0B8000) is an 8-bit buffer, so the value that
is written is the 8-bit value 0x27.

When you use ReadConsoleOutputW, the system does an ANSI-to-Unicode
conversion for you, using the CP437 code page. Since 0x27 in CP437 is
left-pointing-arrow, you read 0x2190.

-It's interesting. If I use ReadConsoleOutputW() on that character, I get
-
- CHAR_INFO::Char.UnicodeChar = 8592
- CHAR_INFO::Char.AsciiChar = 65424 [garbage?]

This would have made more sense if you had looked at this in hex.

8592 = 0x2190
65424 = 0xff90

This is just taking the low-order byte of the Unicode character you got,
and sign-extending it.

-If I use ReadConsoleOutputCharacterW(), I get 8592.
-If I use ReadConsoleOutputCharacterA(), I get 27.
-If I use ReadConsoleOutputA(), I get 27.
-
-So the "A" version of the functions is doing some translating (or the "W"
-version is). WideCharToMultiByte() always failed to translate correctly for
-returning 8592 into 63 ("?", the default un-printable). I wish I understood
-what's going on.

When YOU call WideCharToMultiByte, you are using some other 8-bit code
page, and that code page does not have an encoding for "left-pointing
arrow". If you called WideCharToMultiByte with CP437, you would get 0x27.
--
- Vince

Steve Fabian · Apr 22, 2009

vefatica wrote:
| A seemingly knowledgeable gent replied to my newsgroup query thus
| (below). It's beyond me. Does it make sense to you. I can
| accurately get the character under the mouse cursor and reproduce it.
| As for turning it into a **familiar** number (some character code) I
| think I'm SOL.
|
| Quoting:
|
| You have dipped into subject that mixes ancient history and modern
| internationalization.
|
| The original IBM CGA display included fonts in its ROM that had
| glyphs in
| all 256 places, including the control characters and the high 128
| characters. The glyph for 0x1B was a left-facing arrow.
|
| Today, this character set lives on as the default 8-bit code page for
| command shells, CP437. The console buffer (essentially a
| virtualization of
| the CGA text-mode buffer at 0B8000) is an 8-bit buffer, so the value
| that
| is written is the 8-bit value 0x27.
|
| When you use ReadConsoleOutputW, the system does an ANSI-to-Unicode
| conversion for you, using the CP437 code page. Since 0x27 in CP437 is
| left-pointing-arrow, you read 0x2190.
|
| -It's interesting. If I use ReadConsoleOutputW() on that character,
| I get -
| - CHAR_INFO::Char.UnicodeChar = 8592
| - CHAR_INFO::Char.AsciiChar = 65424 [garbage?]
|
| This would have made more sense if you had looked at this in hex.
|
| 8592 = 0x2190
| 65424 = 0xff90
|
| This is just taking the low-order byte of the Unicode character you
| got,
| and sign-extending it.
|
| -If I use ReadConsoleOutputCharacterW(), I get 8592.
| -If I use ReadConsoleOutputCharacterA(), I get 27.
| -If I use ReadConsoleOutputA(), I get 27.
| -
| -So the "A" version of the functions is doing some translating (or
| the "W"
| -version is). WideCharToMultiByte() always failed to translate
| correctly for
| -returning 8592 into 63 ("?", the default un-printable). I wish I
| understood
| -what's going on.
|
| When YOU call WideCharToMultiByte, you are using some other 8-bit code
| page, and that code page does not have an encoding for "left-pointing
| arrow". If you called WideCharToMultiByte with CP437, you would get
| 0x27.

Seems that the W mode performs glyph-based code translation. The glyph for
any octet that is not a printable ASCII character (0x00-0x1F, 0x7F-0xFF)
depends on the codepage, and is thus translated. Printable characters within
the ASCII range (0x20-0x7E) the W codes should be OK. The A mode seems to be
OK - it does not translate, returns the actual octets.

PS: Nearly 20 years ago I had used the CP437 non-printable codes to display
on the screen the outline drawing of an add-on PC card, showing its proper
jumper settings for the BIOS and other add-in cards in use (to select memory
mapping and port selection).
--
Steve

vefatica · Apr 22, 2009

On Tue, 21 Apr 2009 23:27:56 -0500, Steve Fábián <> wrote:

|Seems that the W mode performs glyph-based code translation. The glyph for
|any octet that is not a printable ASCII character (0x00-0x1F, 0x7F-0xFF)
|depends on the codepage, and is thus translated. Printable characters within
|the ASCII range (0x20-0x7E) the W codes should be OK. The A mode seems to be
|OK - it does not translate, returns the actual octets.
|
|PS: Nearly 20 years ago I had used the CP437 non-printable codes to display
|on the screen the outline drawing of an add-on PC card, showing its proper
|jumper settings for the BIOS and other add-in cards in use (to select memory
|mapping and port selection).

It's all unintelligible to me. Can someone explain this (I'm not complaining).

chcp
Active code page: 437

echo %@ascii[%@char[240]]
240

Fine!

But if I "echo %@char[240]" then copy/paste the result into %@ascii[], I get

echo %@ascii[d]
100

What's going on?
--
- Vince

Steve Fabian · Apr 22, 2009

vefatica wrote:
| It's all unintelligible to me. Can someone explain this (I'm not
| complaining).
|
| chcp
| Active code page: 437
|
| echo %@ascii[%@char[240]]
| 240
|
| Fine!
|
| But if I "echo %@char[240]" then copy/paste the result into
| %@ascii[], I get
|
| echo %@ascii[d]
| 100

In standalone TCC 10.00.67 in Windows XP (SP3), with UnicodeOutput=No, I
had the same result (100) from the command
echo %@ascii[%@execstr[echo %@char[240]]]
as your last point.

When I switched to UnicodeOutput=Yes the command displayed 240! I was
amazed...

The problem is that when UnicodeOutput=No, the display is in ASCII with
CP437 extensions (Ax437 below), so TCC sends its output (including that
which is processed via @EXECSTR without actual display) through a
many-to-few mapping (translation) of Unicode to Ax437, and an Ax437 to
Unicode (one-to-one) mapping before it is used in @ASCII. When
UnicodeOutput=Yes, no mapping is done, thus the output both ways is the same
(lowercase d).

BTW, my myriad of X3.64 color-changing escape sequences work perfectly well
when UnicodeOutput=Yes...
--
Steve

Steve Fabian · Apr 22, 2009

Steve Fabian wrote:
| vefatica wrote:
|| It's all unintelligible to me. Can someone explain this (I'm not
|| complaining).
||
|| chcp
|| Active code page: 437
||
|| echo %@ascii[%@char[240]]
|| 240
||
|| Fine!
||
|| But if I "echo %@char[240]" then copy/paste the result into
|| %@ascii[], I get
||
|| echo %@ascii[d]
|| 100
|
| In standalone TCC 10.00.67 in Windows XP (SP3), with
| UnicodeOutput=No, I had the same result (100) from the command
| echo %@ascii[%@execstr[echo %@char[240]]]
| as your last point.
|
| When I switched to UnicodeOutput=Yes the command displayed 240! I was
| amazed...
|
| The problem is that when UnicodeOutput=No, the display is in ASCII
| with CP437 extensions (Ax437 below), so TCC sends its output
| (including that which is processed via @EXECSTR without actual
| display) through a many-to-few mapping (translation) of Unicode to
| Ax437, and an Ax437 to Unicode (one-to-one) mapping before it is used
| in @ASCII. When UnicodeOutput=Yes, no mapping is done, thus the
| output both ways is the same (lowercase d).
|
| BTW, my myriad of X3.64 color-changing escape sequences work
| perfectly well when UnicodeOutput=Yes...

Search

Welcome!

Unicode question

vefatica

vefatica

rconn

Administrator

vefatica

vefatica

rconn

Administrator

rconn

Administrator

vefatica

rconn

Administrator

vefatica

Steve Fabian

vefatica

Steve Fabian

Steve Fabian

Similar threads