Unicode ... I don't understand

  • This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn more.
#1
Now I get the same output with LOADBTM on or off. But I just don't understand how it works. This is (ASCII) TEST.BTM (as I see it with TYPE, LIST, VIEW, or in an editor).
Code:
echo ² is character %@ascii[²]
echo @CHAR[%@ASCII[²]] is %@CHAR[%@ASCII[²]]
echo ²²²²²²²²²²²²²²²²²²²²
echo %@repeat[²,20]
echo %@repeat[%@char[178],20]
In the file, that superscript 2 is 0xB2 (178). When I run the BTM I see
Code:
▓ is character 9619
@CHAR[9619] is ▓
▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓
▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓
²²²²²²²²²²²²²²²²²²²²
I get a different result at the command line.
Code:
v:\> echo ² is character %@ascii[²] & echo @CHAR[%@ASCII[²]] is %@CHAR[%@ASCII[²]] & echo ²²²²²²²²²²²²²²²²²²²² & echo %@repeat[²,20] & echo %@repeat[%@char[178],20]
² is character 178
@CHAR[178] is ²
²²²²²²²²²²²²²²²²²²²²
²²²²²²²²²²²²²²²²²²²²
²²²²²²²²²²²²²²²²²²²²
Why are they different and where's character 9619 coming from?
 

Charles Dye

Super Moderator
Staff member
May 20, 2008
3,483
44
Albuquerque, NM
prospero.unm.edu
#2
Different character sets. Your 8-bit file is interpreted according to your current (console or "OEM") code page, most likely code page 437. If you Google code page 437, the first hit is a Wikipedia page with a little table giving you not just pictures of the various characters, but also Unicode equivalents. Character 0xB2 in code page 437 is a graphics character, mapping to Unicode U+2593 -- or 9616 decimal.

TCC uses Unicode internally. All the stuff you type at the command line, or paste in from another program, is Unicode. And the superscripted 2 just happens to be Unicode character U+00B2.

Just to add to the confusion, most Windows programs (the non-console ones) use yet another character set, the Windows code page. (Even more confusingly, that one is also called the "ANSI code page" even though the American National Standards Institute had nothing to do with it.) So if you open that batch file in, say, Notepad, you might see a third character, like an ogonek if you happen to be using code page 1250.

Code pages are an idea whose time has come and long since gone. Unfortunately for Rex, compatibility with CMD.EXE requires that he continue to support the nasty things.