By registering with us, you'll be able to discuss, share and private message with other members of our community.
SignUp Now!C:\>ver
TCC 33.00.2 x64 Windows 11 [Version 10.0.22621.3880]
C:\>option unicodeoutput
unicodeoutput=No
C:\>echos **%@char[160]** | tee clip:
** **
C:\>echo %@ascii[%@clip[]]
42 42 160 42 42
C:\>
C:\>chcp 1252
Active code page: 1252
C:\>option //unicodeoutput=no
C:\>echos **%@char[160]** | tee clip:
** **
C:\>echo %@ascii[%@clip[]]
42 42 225 42 42
C:\>option //unicodeoutput=yes
C:\>echos **%@char[160]** | tee clip:
** **
C:\>echo %@ascii[%@clip[]]
42 42 160 42 42
C:\>
I don't understand a word of that. How do I get both of these to produce a NBSP?This isn't DOS, and you aren't creating ANSI text (well, 1252 is "pseudo ANSI").
Your string is UTF16 with a rather odd character in the middle. When it's written to a temp file (to support the CLIP: pseudo-device), it is converted from UTF16 to ASCII, and that's where the 'á' is created.
Solution - use Unicode (either UTF16 or UTF8) everywhere, and don't try to mix and match Unicode and ASCII when you're using extended characters.
I don't understand a word of that. How do I get both of these to produce a NBSP?
v:\> chcp 65001
Active code page: 65001
v:\> echos **%@char[160]** | tee clip:
** **
v:\wordle> echo %@clip[0]
** **
v:\> echo %@regquery[HKLM\system\currentcontrolset\Control\Nls\CodePage\ACP]
1252
So that (which is seemingly wrong) is the problem, eh? Char 160 is nbsp in CP 1252.5. WideCharToMultiByte converts your NBSP Unicode character to 'á' for a 1252 codepage (if you don't like that, you can complain to Microsoft, but I doubt you'll get much joy from them) and is written to the ASCII file.
What CP do you specify in the call to WideCharToMultiByte?
I'm not exp[licitly asking for any particular encoding. And when I don't, I kinda expect TCC to be consistent. Who's mixing and matching? It's not me, at least not on purpose. According to Google CP 65001 is UTF8 and that's no better.
So what do I do if I want to be able to use nbsp ( also char 177, '±') freely and have it appear the same everywhere in TCC and have files written by TCC to be (somehow) 8-bit encoded?
v:\> option utf8
utf8=Yes
v:\> option utf8output
utf8output=Yes
v:\> chcp 65001
Active code page: 65001
v:\> echo **%@char[160]** | tee clip:
** **
v:\> echo %@clip[0]
** **
v:\> echo **%@char[177]** | tee clip:
**±**
v:\> type clip:
**┬▒**
C:\>chcp
Active code page: 1252
C:\>option unicodeoutput
unicodeoutput=No
C:\>echo **%@char[160]** |:u tee clip:
** **
C:\>hexdump clip:
00000000 ff fe 2a 00 2a 00 a0 00 2a 00 2a 00 0d 00 0a 00 · * * * * · ·
C:\>
Yes, would be really interesting ...Any more on this one?
|:u
to pipe as UTF-16. Which I think is generally what you want, with the clipboard. |:8
would work too; UTF-8 and UTF-16 map one-to-one to each other.|:u
in your example above ... |:8
does also work with CP 65001 ... but what is with the BOM (ff fe) - where did that come from?