Q about QueryIsFileUnicode

#1
When QueryIsFileUnicode is used on a disk file and a BOM is found, the function seems to leave the file pointer positioned after the BOM. Is that correct?

Besides looking for a BOM, what else does QueryIsFileUnicode do? I ask because it fails to identify a file containing only this line (below), woth no BOM, as Unicode.

Code:
0000 0000 61 00 20 00 3e 00 20 00  62 00 20 00 3c 00 20 00  a. .>. .b. .<. .
0000 0010 63 00 20 00 26 00 20 00  64 00 20 00 7c 00 20 00  c. .&. .d. .|. .
0000 0020 65 00 20 00 60 00 20 00  66 00 20 00 25 00 66 00  e. .`. .f. .%.f.
0000 0030 6f 00 6f 00 20 00 25 00  70 00 61 00 74 00 68 00  o.o. .%.p.a.t.h.
0000 0040 0d 00 0a 00                                       ....
 

rconn

Administrator
Staff member
May 14, 2008
10,752
97
#2
> When QueryIsFileUnicode is used on a disk file and a BOM is found, the
> function seems to leave the file pointer positioned after the BOM. Is
> that correct?
Yes.


> Besides looking for a BOM, what else does QueryIsFileUnicode do? I ask
> because it fails to identify a file containing only this line (below),
> woth no BOM, as Unicode.
QueryIsFileUnicode does not look for a BOM; it just skips it if the text is
declared (by Windows) to be Unicode. It calls the Windows API
IsTextUnicode; if you have a problem with that API you should ask Microsoft
for details.

Rex Conn
JP Software
 
#3
Yes.
QueryIsFileUnicode does not look for a BOM; it just skips it if the text is
declared (by Windows) to be Unicode. It calls the Windows API
IsTextUnicode; if you have a problem with that API you should ask Microsoft
for details.
Rex Conn
JP Software
What tests do you ask IsTextUnicode to do?
 

rconn

Administrator
Staff member
May 14, 2008
10,752
97
#4
> QueryIsFileUnicode does not look for a BOM; it just skips it if the
> text is
> declared (by Windows) to be Unicode. It calls the Windows API
> IsTextUnicode; if you have a problem with that API you should ask
> Microsoft for details.
> ---End Quote---
> What tests do you ask IsTextUnicode to do?
IS_TEXT_UNICODE_ASCII16 | IS_TEXT_UNICODE_SIGNATURE |
IS_TEXT_UNICODE_ILLEGAL_CHARS

After trying dozens of combinations, that's the one I've found to get the
best overall results.

Rex Conn
JP Software
 
#5
IS_TEXT_UNICODE_ASCII16 | IS_TEXT_UNICODE_SIGNATURE |
IS_TEXT_UNICODE_ILLEGAL_CHARS

After trying dozens of combinations, that's the one I've found to get the
best overall results.
Yes, that's reasonable. And from the description of IS_TEXT_UNICODE_ASCII16 I'd expect it to catch
Code:
L"a > b < c & d | e ` f %foo %path"
But it doesn't. The only tests which ID that as Unicode are IS_TEXT_UNICODE_STATISTICS, IS_TEXT_UNICODE_CONTROLS, and IS_TEXT_UNICODE_NULL_BYTES. [Hmmm! I just found some articles suggesting is useless/inconsistent.]