If I start TCC with /U ...
That's as expected.
Now I use a hex-editor to remove the BOM.
While I'm not sure about why the result is different, I am confident TCC doesn't identify the new file as Unicode. The test IS_TEXT_UNICODE_ASCII16 (The text is Unicode, and contains only zero-extended ASCII values/characters) is useless. Here's part of a query I made (without satisfactory results) in microsoft.public.vc.language.
FWIW, this is the kind of file you get from CMD started with /U.
Code:
v:\> for /l %i in (1,1,1000) ( echo abc >> abc.txt )
v:\> echo %@lines[abc.txt]
999
Now I use a hex-editor to remove the BOM.
Code:
v:\> hexe abc.txt
v:\> echo %@lines[abc.txt]
1000
Code:
LPCWSTR szStr[6] = {L"A", L"A ", L"A b", L"A bu", L"A bug", L"A bug!"};
for ( INT i=0; i<6; i++ )
{
INT test = IS_TEXT_UNICODE_ASCII16;
BOOL bResult = IsTextUnicode(szStr[i], 2*wcslen(szStr[i]), &test);
wprintf(L"L\"%s\" is %sUnicode", szStr[i], bResult ? L"" : L"not ");
wprintf(L" (0x%X)\n", test);
}
L"A" is not Unicode (0x5)
L"A " is Unicode (0x1)
L"A b" is Unicode (0x1)
L"A bu" is not Unicode (0x0)
L"A bug" is not Unicode (0x0)
L"A bug!" is not Unicode (0x0)
The results are different (but equally confusing) results if the terminating NUL
is included in the test:
/* as above but with */
BOOL bResult = IsTextUnicode(szStr[i], 2*wcslen(szStr[i])+2, &test);
L"A" is Unicode (0x1)
L"A " is Unicode (0x1)
L"A b" is not Unicode (0x0)
L"A bu" is not Unicode (0x0)
L"A bug" is not Unicode (0x0)
L"A bug!" is not Unicode (0x0)