Welcome!

By registering with us, you'll be able to discuss, share and private message with other members of our community.

SignUp Now!

WAD ANSWERED: Clipboard doesn't support utf-8 ? [answer: nope, because Microsoft]

Jul
516
10
I'm just going to assume this isn't a bug and there's a logical explanation i haven't figured out.

However, I do think it's quite strange that we can echo :u8 to a file and it outputs to console fine, but doing the same thing to the clipboard results in a failure of sorts. Maybe there's something i didn't consider.

I also don't quite get why there is a >:u8 and a |:u8 but not a <:u8 ... again, i'm sure there's a logical reason.


1730414295805.webp



C:\>echo > foo %+ type foo
??

C:\>echo >:u8 foo %+ type foo


C:\>echo > clip: %+ type <clip:
??

C:\>echo >:u8 clip: %+ type <:u8 clip:
TCC: (Sys) The system cannot find the file specified.
"C:\:u8"

C:\>echo >:u8 clip: %+ type <clip:
🔈


I also double-checked that even if it doesn't display on the screen, maybe it was really the characters? So I put them to a file just to be sure. Nope!
1730414503703.webp
 
p.s. yes i used ansi codes and sixels to make a custom pentagram character which I then put into my prompt :)

There's actually another bug related to that, which I will report in the future, but basically, running my set-ansi BAT file that makes all this happen a successive time renders the custom pentagram char un-re-create-able which I believe is a Windows Terminal bug.
 
If I'm redirecting anything non-ASCII to the clipboard, I do an option //unicodeoutput=yes first:

fancychar.webp


I don't know why the >:U8 syntax isn't working for you. Maybe because CLIP isn't a real device; TCC is redirecting to a temp file, slurping that into the clipboard, and deleting the temp file. Perhaps the UTF-8 part gets lost somewhere along the way...? But using OPTION beforehand works for me.

(UnicodeOutput, not UTF8Output, because the Windows clipboard uses UTF-16 internally.)
 
Oooh, thanks for the tip!!!!!

and >:u8 is working for me just fine actually
 
I'm just going to assume this isn't a bug and there's a logical explanation i haven't figured out.

However, I do think it's quite strange that we can echo :u8 to a file and it outputs to console fine, but doing the same thing to the clipboard results in a failure of sorts. Maybe there's something i didn't consider.

WAD, and not a TCC issue. The Windows clipboard does not support UTF-8.
 
WAD, and not a TCC issue. The Windows clipboard does not support UTF-8.


Ahhh. There shouild maybe a another category: "WOMD" .. working as microsoft designed and we're copying their flaws :) :) :)

So there's no utf-8 clipboards?
 
HOLY COW!

It cannot UTF-8 ... IT CANNOT UTF-8??!!!

HOW many years have I been using Clipboard and I NEVER noticed this? WOW LOL

I found the Ditto extension:

Ditto Extension

However, I read in the forums that there is a bug which crashes OneNote (no idea if that is a generally problem (on all systems)) ... but ELSE, it COULD be something ...
  • Full Unicode support(display foreign characters)
  • UTF-8 support for language files(create language files in any language)
Maybe this YouTube vid can give an overview ...

Ditto clipboard manager

Greetings
 
HOLY COW!

It cannot UTF-8 ... IT CANNOT UTF-8??!!!

The clipboard uses UTF-16 internally. UTF-16 is one way to represent Unicode characters. UTF-8 is another way to represent Unicode characters. There is a one-to-one correspondence between the two.
 
Microsoft has been (slowly) adding UTF-8 support to some Windows APIs. But not yet to the clipboard APIs.

TCMD / TCC has its own UTF-8 support for most things, but adding clipboard support would require hooking / replacing the Windows APIs, which would be a major (and very intrusive) undertaking for a minor benefit. (It took 25+ years for someone to notice that the clipboard didn't handle UTF-8!) It's simple enough to convert to/from UTF-16, which is the clipboard's native format.
 
Oof.

Okay then. Understandable.

Though part of me says "not even a clip11: that's just a miniature implementation for piping " but yea I realize nothing ever ends up miniature in the end.

Well then. I at least feel special for discovering something after 25 years. Cheers

I gotta figure out how to fix my sort-clipboard.bat then ....

Trying to sort a list of songs while discussing music with someone, only to get an empty clipboard every time I sorted it, because of foreign characters in some song titles.

for the longest time, it was more or less:

(type clip: | sort) >clip:

But then once the weird characters started coming in:

(type clip |:u8 sort) >:u8clip:

failed me, because here's no >:u8

And then

(type clip |:u8 sort) >clip: failed me

At which point I started to realize.... The clipboard is empty!
Broke it down into steps and found the clipboard problem.
After 25 years? Lol.

Anyway — Is there some easy %@FUNCTION I can run it through that would preserve the "oneliner" aspect of this clipboard sorting command?
 
I would suggest putting an option //unicodeoutput=yes before the above line. You can change it back to NO afterwards if you like.
Thank you! I updated things and I'll try it out!
 
Microsoft has been (slowly) adding UTF-8 support to some Windows APIs. But not yet to the clipboard APIs.

TCMD / TCC has its own UTF-8 support for most things, but adding clipboard support would require hooking / replacing the Windows APIs, which would be a major (and very intrusive) undertaking for a minor benefit. (It took 25+ years for someone to notice that the clipboard didn't handle UTF-8!) It's simple enough to convert to/from UTF-16, which is the clipboard's native format.

One last clarification question...

So we know this won't work:
echo >:u8 clip:
1730996969878.webp


But how come i can select that same character with my mouse to get it into windows clipboard, and the real character IS in the windows clipboard, and pastes just fine into anywhere.

Basically, why can I put these characters into my clipboard on my windows but TCC can't? It doesn't seem to be a windows limitation if i can do it just fine anywhere outside of TCC.
 
You can do this in TCC too:

Code:
C:\Bin\JPSDK>option //unicodeoutput=yes

C:\Bin\JPSDK>echo %@char[0x1f31f] > clip:

C:\Bin\JPSDK>type clip:


C:\Bin\JPSDK>

//UnicodeOutput=Yes is your friend.
 
Unfortunately the forum doesn't seem to like high characters, so here it is again as a graphic:

More-Unicode-fun.webp
 
You know... I thought I'd set it in the INI file but... I've noticed things in the INI file can act strange.

I'll just stick it my tcstart equivalent.

Your example doesn't work for me for some reason:

1731005395026.webp



but it also absolutely fixes the problem:
1731005576047.webp


The real problem is .... i was thinking i had set it when i hadn't. sigh.
Thanks for the re-iteration of that! I needed a proverbial club over the head.
 

Okay, that's why my echo %@char[0x1f31f] above didn't work for you. @CHAR and @UNICODE don't support characters above 0xFFFF until version 33, so you can't generate high-order characters that way. Sorry.
 
Okay, that's why my echo %@char[0x1f31f] above didn't work for you. @CHAR and @UNICODE don't support characters above 0xFFFF until version 33, so you can't generate high-order characters that way. Sorry.
How else would I have found out about that new feature?! :)
 
One more question....

Is there a way to make "|" become "|:u8" by default?

[and if so, a way to override that for a non-u8 one?]
 
How else would I have found out about that new feature?! :)

That's a good question, actually. The updates to @CHAR and @UNICODE are documented, somewhat obscurely, in the help file on the "What's New in Version 33" page. You can view the same page on the web here: What's New in Version 33

But unless you know what "UTF-16 surrogate pairs" are, the mention under What's-New probably won't communicate much to you.... :rolleyes:
 
Oh i'm familiar with surrogate pairs. Just writing a program to print out every possible printable character requires one delve into such eldritch horrors involuntarily
 
Admittedly , I hadn't read the change logs for the new versions yet. They're really good reading to me, too!
 
One more question....

Is there a way to make "|" become "|:u8" by default?

[and if so, a way to override that for a non-u8 one?]

There is also option //utf8output=yes.

I don't remember what happens if you have both UnicodeOutput and UTF8Output turned on at the same time. Probably one of them overrides the other, but I don't recall which.

I would suggest you think of UTF-8 as being useful mainly for files. For the clipboard you need UTF-16. I don't know why redirecting UTF-8 to the clipboard does not work; translation between the two is straightforward.
 
Oh i'm familiar with surrogate pairs. Just writing a program to print out every possible printable character requires one delve into such eldritch horrors involuntarily

Then you'll see why you can use %@char[0xd83c 0xdf1f] to get your glowing star in older versions of TCC.
 
One last clarification question...

So we know this won't work:
echo >:u8 clip:
View attachment 4623

But how come i can select that same character with my mouse to get it into windows clipboard, and the real character IS in the windows clipboard, and pastes just fine into anywhere.

Basically, why can I put these characters into my clipboard on my windows but TCC can't? It doesn't seem to be a windows limitation if i can do it just fine anywhere outside of TCC.

If it appears on your screen, it's UTF-16, not UTF-8. All text output in Windows is UTF-16.
 
Is it me, or does tee not do utf-8 either?

Or rather, maybe nothing to do with that and i'm not undestanding: i have emoji piping through a postproccesor and spitting to my screen just fine, but if i |&:u8 them to tee to also save it in a logfile, the emoji don't display on screen.

basically, this 1ˢᵗ line does not ruin emoji, but this 2ⁿᵈ line does ruin them:


`
Code:
%LAST_WHISPER_COMMAND% |:u8 copy-move-post whisper
%LAST_WHISPER_COMMAND% |:u8 copy-move-post whisper |&:u8 tee /a "%OUR_LOGFILE%"
 
Back
Top