Unicode anomaly

May 20, 2008
11,285
95
Syracuse, NY, USA
If I start TCC with /U ...

Code:
v:\> for /l %i in (1,1,1000) ( echo abc >> abc.txt )

v:\> echo %@lines[abc.txt]
999
That's as expected.

Now I use a hex-editor to remove the BOM.
Code:
v:\> hexe abc.txt

v:\> echo %@lines[abc.txt]
1000
While I'm not sure about why the result is different, I am confident TCC doesn't identify the new file as Unicode. The test IS_TEXT_UNICODE_ASCII16 (The text is Unicode, and contains only zero-extended ASCII values/characters) is useless. Here's part of a query I made (without satisfactory results) in microsoft.public.vc.language.
Code:
LPCWSTR szStr[6] = {L"A", L"A ", L"A b", L"A bu", L"A bug", L"A bug!"};
for ( INT i=0; i<6; i++ )
{
    INT test = IS_TEXT_UNICODE_ASCII16;
    BOOL bResult = IsTextUnicode(szStr[i], 2*wcslen(szStr[i]), &test);
    wprintf(L"L\"%s\" is %sUnicode", szStr[i], bResult ? L"" : L"not ");
    wprintf(L" (0x%X)\n", test);
}

L"A" is not Unicode (0x5)
L"A " is Unicode (0x1)
L"A b" is Unicode (0x1)
L"A bu" is not Unicode (0x0)
L"A bug" is not Unicode (0x0)
L"A bug!" is not Unicode (0x0)

The results are different (but equally confusing) results if the terminating NUL
is included in the test:

/* as above but with */
BOOL bResult = IsTextUnicode(szStr[i], 2*wcslen(szStr[i])+2, &test);

L"A" is Unicode (0x1)
L"A " is Unicode (0x1)
L"A b" is not Unicode (0x0)
L"A bu" is not Unicode (0x0)
L"A bug" is not Unicode (0x0)
L"A bug!" is not Unicode (0x0)
FWIW, this is the kind of file you get from CMD started with /U.
 
Similar threads
Thread starter Title Forum Replies Date
Peter Murschall TEE cannot handle Unicode output Support 2
B Fullwidth Unicode forms display incorrectly Support 5
T @execstr unicode support Support 6
Peter Murschall TPIPE generate unicode on Piping or redirecting Support 3
D Pasting Unicode data has different behavior on TCC and CMD Support 2
vefatica TYPE goes crazy with no-BOM Unicode file Support 7
Charles Dye TCC smashing Unicode quotes Support 9
Peter Murschall UNICODE mixed with ANSI Code Support 11
Joe Caverly Unicode, Codepage 437, and line characters Support 3
B How to? Convert Unicode to ANSI Support 1
StarliteLemming Fileread fails on Unicode file Support 10
vefatica DO ... /P ... and Unicode? Support 3
vefatica Unicode ... I don't understand Support 1
jadaml Echo unicode characters from UTF-8 Batch files? Support 1
Charles Dye @ASCII vs. @UNICODE Support 5
A How to? Filter history list with unicode chars Support 0
vefatica TYPE, Unicode, installer Support 10
A WAD Limitations on display of unicode punctuation marks Support 11
A Include lists and Unicode Support 1
M How to? How do I read a Unicode file through standard-input? Support 4
M WAD A bit of strangeness related to Unicode-marked file not being Unicode Support 2
M @CHAR and UNICODE Support 4
D LIST command wierdness with empty Unicode file Support 1
B Unicode/dword issue in TCC12 Support 4
J dir failure with some unicode characters Support 6
M TCC Unicode support? Support 7
vefatica BOMs in [dir]history / TAIL with Unicode Support 2
vefatica Unicode screw-up in IDE Support 4
vefatica Debugger now Unicode? Support 1
vefatica TYPE /X and Unicode files? Support 0
dcantor Convert ASCII to Unicode or vice versa? Support 6
H HISTORY and DIRHISTORY /R can't handle Unicode Support 0
R Reading an Unicode file with more than 8191 lines Support 1
vefatica Copy/Paste anomaly Support 4
T TCC Window Background Color Anomaly Support 9
vefatica MOVE anomaly Support 8
vefatica WMIQUERY anomaly Support 1
vefatica An IDE anomaly Support 2
Roedy Loop anomaly Support 6
P Toolbar configuration anomaly Support 5
vefatica Filename completion anomaly Support 2

Similar threads