1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Unicode anomaly

Discussion in 'Support' started by vefatica, Oct 10, 2009.

  1. vefatica

    Joined:
    May 20, 2008
    Messages:
    7,794
    Likes Received:
    29
    If I start TCC with /U ...

    Code:
    v:\> for /l %i in (1,1,1000) ( echo abc >> abc.txt )
    
    v:\> echo %@lines[abc.txt]
    999
    That's as expected.

    Now I use a hex-editor to remove the BOM.
    Code:
    v:\> hexe abc.txt
    
    v:\> echo %@lines[abc.txt]
    1000
    While I'm not sure about why the result is different, I am confident TCC doesn't identify the new file as Unicode. The test IS_TEXT_UNICODE_ASCII16 (The text is Unicode, and contains only zero-extended ASCII values/characters) is useless. Here's part of a query I made (without satisfactory results) in microsoft.public.vc.language.
    Code:
    LPCWSTR szStr[6] = {L"A", L"A ", L"A b", L"A bu", L"A bug", L"A bug!"};
    for ( INT i=0; i<6; i++ )
    {
        INT test = IS_TEXT_UNICODE_ASCII16;
        BOOL bResult = IsTextUnicode(szStr[i], 2*wcslen(szStr[i]), &test);
        wprintf(L"L\"%s\" is %sUnicode", szStr[i], bResult ? L"" : L"not ");
        wprintf(L" (0x%X)\n", test);
    }
    
    L"A" is not Unicode (0x5)
    L"A " is Unicode (0x1)
    L"A b" is Unicode (0x1)
    L"A bu" is not Unicode (0x0)
    L"A bug" is not Unicode (0x0)
    L"A bug!" is not Unicode (0x0)
    
    The results are different (but equally confusing) results if the terminating NUL
    is included in the test:
    
    /* as above but with */
    BOOL bResult = IsTextUnicode(szStr[i], 2*wcslen(szStr[i])+2, &test);
    
    L"A" is Unicode (0x1)
    L"A " is Unicode (0x1)
    L"A b" is not Unicode (0x0)
    L"A bu" is not Unicode (0x0)
    L"A bug" is not Unicode (0x0)
    L"A bug!" is not Unicode (0x0)
    FWIW, this is the kind of file you get from CMD started with /U.
     

Share This Page