Incorrect Unicode detection in "type" and "head" commands

Feb 19, 2019
2
1
TCC 26.02.43 x64 Windows 10 [Version 10.0.19044.1503]

I have an ASCII file "fred.hex" containing one very long line, mostly repetitions of "|00 00" and terminated by CRLF. When the line is longer than 512 chars the output from typing the file (or head) shows the per thousand symbol and the unknown char symbol and a space. type /X shows the correct hex codes on the left but also renders them as Unicode chars on the right (as per attached PNG).

The line is:
-13:03:59| 0| 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00|00 00|00 00|00 00|00 00|00 00|00 00|00 00...

Originally it was 1700 chars long, but continually reducing the length (by bisection initially) resulted in correct display as the length dropped below 512 chars. Examining the critical 6 chars shows no hidden non-ASCII chars. Deleting one of the " 00" from the initial group also results in the correct display, even though the line is over 512 chars.
Regenerating the file by hand in notepad also exhibits the same behaviour, ruling out a hidden control char I missed.
I do not have UTF8 or Unicode output enabled in "option".
Is there any way to force ASCII display?
Thanks, Len.
 

Attachments

  • Bad TCC type.png
    Bad TCC type.png
    4.4 KB · Views: 72
  • Like
Reactions: Alpengreis
May 20, 2008
12,175
133
Syracuse, NY, USA
Similar (but different) here (with TCC v28). VIEW gets it right (as does Gnu CAT).

1645582730272.png


But TYPE (without /X), LIST, HEAD, and TAIL all show

1645582918112.png


With /X, TYPE shows the hex correctly but the text is as above.

I wonder if it's the Win32 function IsTextUnicode? I'll test it.

I use codepage 1252 if it matters.
 
  • Like
Reactions: Alpengreis
May 20, 2008
12,175
133
Syracuse, NY, USA
Since I don't know which tests TCC uses, I told IsTextUnicode to use all tests (lpiResult = nullptr). I got

Code:
546 bytes were read
IsTextUnicode() returned TRUE

I don't know if anything can be done about that.
 
May 20, 2008
12,175
133
Syracuse, NY, USA
IsTextUnicode() is a Microsoft Win32 API function. I also tested TCC'd QueryIsFileUnicode() function (which, no doubt, uses the WIN32 function) in a plugin. The results were as I reported above
 
  • Like
Reactions: Alpengreis

Similar threads