Further to the above, something else I just noticed.
My test TCC is using unicode ("tcc /u ...."), utf-16 ("chcp 10000") and Courier New (seemingly the most comprehensive standard monospaced Unicode font provided with XP).
So, if TCC is outputting utf-16, it should be capable of accurately displaying any filenames that I can create in Explorer.
So I made the following test files (attached as ZIP, they're only a byte each):
and tried a DIR on them:
[C:\Projects\xcrc\src\test.probs] dir /b
Observe that the last two are wrong, though I can redirect via the clipboard to a utf-16 editor (also displaying in Courier New) and they display correctly. Hence it seems a "display" issue rather than a "wrong data" issue.
Looking at the Unicode codepoints of the non-ACII characters in the two failing filenames, it seems that MSB==0 characters work, MSB<>0 ones don't:
7_åßĉ.txt -> 7_åß?.txt
å = 0x000E5 (displayed ok)
ß = 0x000DF (displayed ok)
ĉ = 0x00109 (displayed wrong)
8_ấъç.txt -> 8_??ç.txt:
ấ = 0x01ea5 (displayed wrong)
ъ = 0x0044a (displayed wrong)
ç = 0x000e7 (displayed ok)
Any thoughts? Am I being stupid and missing something obvious?