Welcome!

By registering with us, you'll be able to discuss, share and private message with other members of our community.

SignUp Now!

Done @CHAR and @UNICODE

Charles Dye

Super Moderator
May
5,105
133
Staff member
High time these two supported characters outside the BMP, i.e. 0x10000 <= n <= 0x10FFFF.
 
Since those characters won't fit in UTF-16, I'm not sure what you're suggesting. Did you want UTF-32 support or to do this with a 3 or 4 byte UTF8?

They can be encoded in UTF-16. Characters above 0xFFFF are encoded as two wchar_ts, the first encoding the high ten bits of the character, and the second encoding the low one -- a "surrogate pair". See e.g. Wikipedia for the details.

So, when @CHAR finds a value above 0xFFFF, it should return the surrogate pair for the specified character. And conversely, when @UNICODE finds a surrogate pair in the input string, it should return a single value > 0xFFFF. (Values above 0x10FFFF are illegal, and should give an error message.)
 
Last edited:
This is the current Take Command using Consolas:

high-order-unicode-chars.png



But I don't know whether those characters are actually in the Consolas font, or whether Windows is just doing its font-substitution thing. I suspect the latter. Those glyphs look much the same in Lucida Console or Courier New.
 
I would love to see this too.
 
Back
Top
[FOX] Ultimate Translator
Translate