Further to the above, something else I just noticed.
My test TCC is using unicode ("tcc /u ...."), utf-16 ("chcp 10000") and Courier New (seemingly the most comprehensive standard monospaced Unicode font provided with XP).
So, if TCC is outputting utf-16, it should be capable of accurately displaying any filenames that I can create in Explorer.
So I made the following test files (attached as ZIP, they're only a byte each):
Code:
1_@bµ.txt
2_a%20b.txt
3_a[b'c_3.txt
4_a`b]c.txt
5_a`b'c.txt
6_ab'c.txt
7_åßĉ.txt
8_ấъç.txt
and tried a DIR on them:
Code:
[C:\Projects\xcrc\src\test.probs] dir /b
1_@bµ.txt
2_a%20b.txt
3_a[b'c_3.txt
4_a`b]c.txt
5_a`b'c.txt
6_ab'c.txt
7_åß?.txt
8_??ç.txt
Observe that the last two are wrong, though I can redirect via the clipboard to a utf-16 editor (also displaying in Courier New) and they display correctly. Hence it seems a "display" issue rather than a "wrong data" issue.
Looking at the Unicode codepoints of the non-ACII characters in the two failing filenames, it seems that MSB==0 characters work, MSB<>0 ones don't:
Code:
7_åßĉ.txt -> 7_åß?.txt
å = 0x000E5 (displayed ok)
ß = 0x000DF (displayed ok)
ĉ = 0x00109 (displayed wrong)
8_ấъç.txt -> 8_??ç.txt:
ấ = 0x01ea5 (displayed wrong)
ъ = 0x0044a (displayed wrong)
ç = 0x000e7 (displayed ok)
Any thoughts? Am I being stupid and missing something obvious?