Unicode anomaly

May 20, 2008
Syracuse, NY, USA
If I start TCC with /U ...

v:\> for /l %i in (1,1,1000) ( echo abc >> abc.txt )

v:\> echo %@lines[abc.txt]
That's as expected.

Now I use a hex-editor to remove the BOM.
v:\> hexe abc.txt

v:\> echo %@lines[abc.txt]
While I'm not sure about why the result is different, I am confident TCC doesn't identify the new file as Unicode. The test IS_TEXT_UNICODE_ASCII16 (The text is Unicode, and contains only zero-extended ASCII values/characters) is useless. Here's part of a query I made (without satisfactory results) in microsoft.public.vc.language.
LPCWSTR szStr[6] = {L"A", L"A ", L"A b", L"A bu", L"A bug", L"A bug!"};
for ( INT i=0; i<6; i++ )
    BOOL bResult = IsTextUnicode(szStr[i], 2*wcslen(szStr[i]), &test);
    wprintf(L"L\"%s\" is %sUnicode", szStr[i], bResult ? L"" : L"not ");
    wprintf(L" (0x%X)\n", test);

L"A" is not Unicode (0x5)
L"A " is Unicode (0x1)
L"A b" is Unicode (0x1)
L"A bu" is not Unicode (0x0)
L"A bug" is not Unicode (0x0)
L"A bug!" is not Unicode (0x0)

The results are different (but equally confusing) results if the terminating NUL
is included in the test:

/* as above but with */
BOOL bResult = IsTextUnicode(szStr[i], 2*wcslen(szStr[i])+2, &test);

L"A" is Unicode (0x1)
L"A " is Unicode (0x1)
L"A b" is not Unicode (0x0)
L"A bu" is not Unicode (0x0)
L"A bug" is not Unicode (0x0)
L"A bug!" is not Unicode (0x0)
FWIW, this is the kind of file you get from CMD started with /U.
