Copy to clip changes character

David Marcus · Mar 14, 2012

Would someone please explain why copying a character like xf5 to the clipboard changes it to a different character?

Code:

C:\Junk>type Foo.txt
õ
C:\Junk>copy Foo.txt clip:
C:\Junk\Foo.txt => clip:
    1 file copied
 
C:\Junk>copy clip: Bar.txt
clip: => C:\Junk\Bar.txt
    1 file copied
 
C:\Junk>type Bar.txt
)
C:\Junk>ver
 
TCC  13.04.52  Windows Vista [Version 6.0.6002]

Avi Shmidman · Mar 14, 2012

I duplicate this behavior for the "copy foo.txt clip:" command; it seems that this command does not use the default non-unicode high-ASCII language settings of the system.
However, it works if I run:
type foo.txt > clip:
With "type" redirected to the clipboard, I find that the high-ASCII characters *are* copied correctly to the clipboard. Do you find this as well David?

David Marcus said:
Would someone please explain why copying a character like xf5 to the clipboard changes it to a different character?

Code:

C:\Junk>type Foo.txt õ C:\Junk>copy Foo.txt clip: C:\Junk\Foo.txt => clip: 1 file copied C:\Junk>copy clip: Bar.txt clip: => C:\Junk\Bar.txt 1 file copied C:\Junk>type Bar.txt ) C:\Junk>ver TCC 13.04.52 Windows Vista [Version 6.0.6002]

David Marcus · Mar 14, 2012

type foo.txt > clip: does the same as the copy for me, i.e., it changes the character.

rconn · Mar 15, 2012

Keep two things in mind:

1) There's no such thing as a CLIP: device. (Though TCC does a lot of fancy footwork to pretend that there is.)
2) Everything inside TCC, Take Command, and Windows is Unicode.

So anything ASCII copied to the clipboard is going to be converted to Unicode (or in one or two cases, CF_OEMTEXT) and back again. And Windows is really bad at converting high ASCII characters to & from Unicode and ending up with the same character.

Are you using Unicode fonts in TCMD and TCC?

David Marcus · Mar 15, 2012

I know Windows doesn't have a clip: device, but it does have a clipboard. Is there another way to copy the contents of a file to the clipboard?

TCMD and TCC are using Consolas.

The way I ran into this problem was someone sent me some SQL that I wanted to copy into MySQL Query Browser. So, I did "copy foo.sql clip:", then pasted into the query browser window. This changed the characters. I then opened foo.sql in a text editor (Lugaru's Epsilon), copied the entire file, and pasted into the query browser. This worked fine.

JohnQSmith · Mar 15, 2012

David Marcus said:
Is there another way to copy the contents of a file to the clipboard?

Horst Schaeffer's ClipText works nicely and it's only 6KB. He even has the PureBasic source code available for download.

Charles Dye · Mar 15, 2012

David Marcus said:
Would someone please explain why copying a character like xf5 to the clipboard changes it to a different character?

Code:

C:\Junk>type Foo.txt õ C:\Junk>copy Foo.txt clip: C:\Junk\Foo.txt => clip: 1 file copied C:\Junk>copy clip: Bar.txt clip: => C:\Junk\Bar.txt 1 file copied C:\Junk>type Bar.txt ) C:\Junk>ver TCC 13.04.52 Windows Vista [Version 6.0.6002]

What character encoding is Foo.txt using? Is there a BOM?

Which OEM code page are you using? (The CHCP command reports this.)

David Marcus · Mar 15, 2012

The file size is 1 byte. No BOM. I'm using code page 1252. I open my text editor, enter the character, save the file, copy the file to clip:, go back to my editor, paste, and the character is changed.

I can go to my editor, copy the character, go to TCMD, paste, and the character is the same as in the editor.

rconn · Mar 15, 2012

David Marcus said:
Would someone please explain why copying a character like xf5 to the clipboard changes it to a different character?

Code:

C:\Junk>type Foo.txt õ C:\Junk>copy Foo.txt clip: C:\Junk\Foo.txt => clip: 1 file copied C:\Junk>copy clip: Bar.txt clip: => C:\Junk\Bar.txt 1 file copied C:\Junk>type Bar.txt ) C:\Junk>ver TCC 13.04.52 Windows Vista [Version 6.0.6002]

I can't reproduce that here:

Code:

[D:\TakeCommand13\tcc]type foo.txt
⌡
[D:\TakeCommand13\tcc]copy foo.txt clip:
D:\TakeCommand13\tcc\foo.txt => clip:
     1 file copied

[D:\TakeCommand13\tcc]copy clip: bar.txt
clip: => D:\TakeCommand13\tcc\bar.txt
     1 file copied

[D:\TakeCommand13\tcc]type bar.txt
⌡

Looking at foo.txt with LIST /X confirms that it has 1 character (0xf5).

Windows 7 x64, Lucida Console, code page 437. Tried it with Consolas and it still works.

rconn · Mar 15, 2012

I tried switching to CP 1252 and the character is being changed (by Windows, not TCC or TCMD). Not sure there is anything I can do about it, but I'll look into it.

David Marcus · Mar 15, 2012

I just tried it on "TCC 13.04.54 x64 Windows 7 [Version 6.1.7601]" with the identical results. Try it with "chcp 1252".

If I do "chcp 437", save the file in my editor, copy it to clip:, paste into my editor, the character still changes, but now "type Foo.txt" shows the character it changed to. So, it seems that TCC is always assuming code page 437 when copying a file to the clipboard. I wonder what my editor is doing since pasting to TCMD always gives the Latin 1 character, regardless of chcp. Is it passing a unicode character to the clipboard?

David Marcus · Mar 15, 2012

Any idea how my text editor manages to copy the text to the clipboard without it being changed?

Charles Dye · Mar 15, 2012

David Marcus said:
type foo.txt > clip: does the same as the copy for me, i.e., it changes the character.

How about if you first OPTION //UNICODEOUTPUT=YES ?

David Marcus · Mar 15, 2012

Charles Dye said:
How about if you first OPTION //UNICODEOUTPUT=YES ?

No effect on copy, but type now converts to what appears to be the correct unicode character.

rconn · Mar 15, 2012

David Marcus said:
So, it seems that TCC is always assuming code page 437 when copying a file to the clipboard.

It has nothing to do with TCC, which is simply copying the file as-is to the clipboard. It's Windows that is changing the character (in cp 1252) when converting to Unicode (and back).

rconn · Mar 15, 2012

David Marcus said:
Any idea how my text editor manages to copy the text to the clipboard without it being changed?

I suspect your text editor is probably a GUI app, not a console app. Windows converts text differently for GUI vs. console.

David Marcus · Mar 15, 2012

Yes, my text editor is a GUI app (although it does have a console version that I could try if that would be useful). But, I can open TCC, select the character (displayed in the TCC window), copy it, and paste it into my editor or anywhere else, and it doesn't change. I can do the same with a cmd window (using Consolas as the font).

rconn · Mar 24, 2012

The problem here is that you copied the character from your text editor to an ASCII file (if you had saved it as a Unicode file everything would have worked as expected).

TCC dutifully copied the ASCII file to the clipboard as CF_OEMTEXT. When you then tried to COPY (or TYPE) the clipboard, TCC has to read the clipboard as Unicode, and Windows translated the character to the (wrong?) Unicode equivalent. (It wouldn't matter if TCC read the clipboard as CF_OEMTEXT, as it would still have to convert the text to Unicode so that TCC could process it, and Windows would again convert it to the wrong? character.)

There's no way to get around the ASCII -> Unicode conversion; it has to be done at some point (whether on the original save to the clipboard or when reading the clipboard contents).

David Marcus · Mar 24, 2012

Well, yes, the world might be simpler if ASCII files didn't exist and all files were Unicode. I didn't actually copy the file from my text editor. Someone sent me a Latin 1 file. Since, the character is a Latin 1 character, its code is the same in ASCII and Unicode.

Isn't the problem that the wrong code page is being used? After doing "chcp 1252", I can TYPE the file in TCC, select the character, copy it, paste it into another app, and it shows up as the same character that is displayed. But, if I COPY the file to CLIP: and paste it into another app, the character is changed. Of course, if I do "chcp 437", then the character displays differently when I TYPE it in TCC.

I'm not familiar with what CF_OEMTEXT implies. When you specify this, do you also specify the code page? If not, then that's the problem, isn't it? If you can't specify the code page, then maybe COPY shouldn't say the contents is CF_OEMTEXT. Can it say the data is binary? Or, convert it to Unicode using the current code page? Maybe a new COPY option or a new command?

rconn · Mar 25, 2012

The problem is that the character (0xF5) doesn't have any standard conversion to/from Unicode. There's absolutely nothing that TCC can do about it -- the character HAS to be converted to Unicode at some point, and that is done by Windows. Different code pages may cause Windows to convert it differently, but that's outside of TCC's control (and in fact TCC has no idea what the character was prior to its conversion).

David Marcus · Mar 25, 2012

Then it should be documented that CLIP: only works with 7 bit characters.

On Vista and Win 7, "type foo.txt | clip" works. (I tried both 1252 and 437 for the one character.) Maybe TCC should pipe to clip.exe rather than whatever it is currently doing.

rconn · Mar 25, 2012

CLIP: works with all Unicode characters, all 7 bit ASCII characters, and almost all 8-bit "non-standard unsupported in Windows extended ASCII characters" (except, apparently, 0xF5 in 1252). (This certainly isn't unique to CLIP:.) Ask Microsoft for details on which extended 8-bit ASCII characters they don't support in your code page; it's not something that TCC can track (or fix).

If you find this to be a fatal limitation, I'd urge you not to use CLIP:. As I've repeatedly said, I cannot do anything about it short of removing CLIP: from TCC.

rconn · Mar 25, 2012

David Marcus said:
On Vista and Win 7, "type foo.txt | clip" works. (I tried both 1252 and 437 for the one character.) Maybe TCC should pipe to clip.exe rather than whatever it is currently doing.

TCC is doing exactly the same thing with "type foo.txt | clip:" as it does with "copy foo.txt clip:". In both cases, it copies the single 0xF5 character to the clipboard. The problem comes when you try to do something to get it *out* of the clipboard -- which is when TCC (or TCMD) has to ask Windows to convert it to Unicode, and Windows changes it to the wrong character.

vefatica · Mar 25, 2012

rconn said:
TCC is doing exactly the same thing with "type foo.txt | clip:" as it does with "copy foo.txt clip:". In both cases, it copies the single 0xF5 character to the clipboard. The problem comes when you try to do something to get it *out* of the clipboard -- which is when TCC (or TCMD) has to ask Windows to convert it to Unicode, and Windows changes it to the wrong character.

I'd expect "type foo.txt | clip" (which is what David wrote) is using ...\system32\clip.exe

David Marcus · Mar 26, 2012

Yes, sorry. I didn't make that clear. "type foo.txt | clip" is using ...\system32\clip.exe. So, I can create a CopyToClip alias that works. Not sure why TCC can't do the same thing.

The problem isn't just 0xF5 in 1252. Even though I do "chcp 1252", all the text seems to be interpreted as code page 437. E.g., if I create a file Foo.txt containing "ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþ" and do "copy Foo.txt CLIP:", then paste it into notepad, I get "└┴┬├─┼╞╟╚╔╩╦╠═╬╧╨╤╥╙╘╒╓╫╪┘┌█▄▌▐▀αßΓπΣσµτΦΘΩδ∞φε∩≡±≥≤⌠⌡÷≈°∙·√ⁿ²■", which is what I would get if I did "chcp 437" followed by "type Foo.txt".

Search

Welcome!

Copy to clip changes character

David Marcus

Avi Shmidman

David Marcus

rconn

Administrator

David Marcus

JohnQSmith

Charles Dye

Super Moderator

David Marcus

rconn

Administrator

rconn

Administrator

David Marcus

David Marcus

Charles Dye

Super Moderator

David Marcus

rconn

Administrator

rconn

Administrator

David Marcus

rconn

Administrator

David Marcus

rconn

Administrator

David Marcus

rconn

Administrator

rconn

Administrator

vefatica

David Marcus

Similar threads