Q about QueryIsFileUnicode

vefatica · Sep 26, 2009

When QueryIsFileUnicode is used on a disk file and a BOM is found, the function seems to leave the file pointer positioned after the BOM. Is that correct?

Besides looking for a BOM, what else does QueryIsFileUnicode do? I ask because it fails to identify a file containing only this line (below), woth no BOM, as Unicode.

Code:

0000 0000 61 00 20 00 3e 00 20 00  62 00 20 00 3c 00 20 00  a. .>. .b. .<. .
0000 0010 63 00 20 00 26 00 20 00  64 00 20 00 7c 00 20 00  c. .&. .d. .|. .
0000 0020 65 00 20 00 60 00 20 00  66 00 20 00 25 00 66 00  e. .`. .f. .%.f.
0000 0030 6f 00 6f 00 20 00 25 00  70 00 61 00 74 00 68 00  o.o. .%.p.a.t.h.
0000 0040 0d 00 0a 00                                       ....

rconn · Sep 26, 2009

> When QueryIsFileUnicode is used on a disk file and a BOM is found, the
> function seems to leave the file pointer positioned after the BOM. Is
> that correct?

Yes.

> Besides looking for a BOM, what else does QueryIsFileUnicode do? I ask
> because it fails to identify a file containing only this line (below),
> woth no BOM, as Unicode.

QueryIsFileUnicode does not look for a BOM; it just skips it if the text is
declared (by Windows) to be Unicode. It calls the Windows API
IsTextUnicode; if you have a problem with that API you should ask Microsoft
for details.

Rex Conn
JP Software

vefatica · Sep 26, 2009

rconn said:
Yes.
QueryIsFileUnicode does not look for a BOM; it just skips it if the text is
declared (by Windows) to be Unicode. It calls the Windows API
IsTextUnicode; if you have a problem with that API you should ask Microsoft
for details.
Rex Conn
JP Software

What tests do you ask IsTextUnicode to do?

rconn · Sep 27, 2009

> QueryIsFileUnicode does not look for a BOM; it just skips it if the
> text is
> declared (by Windows) to be Unicode. It calls the Windows API
> IsTextUnicode; if you have a problem with that API you should ask
> Microsoft for details.
> ---End Quote---
> What tests do you ask IsTextUnicode to do?

IS_TEXT_UNICODE_ASCII16 | IS_TEXT_UNICODE_SIGNATURE |
IS_TEXT_UNICODE_ILLEGAL_CHARS

After trying dozens of combinations, that's the one I've found to get the
best overall results.

Rex Conn
JP Software

vefatica · Sep 27, 2009

rconn said:
IS_TEXT_UNICODE_ASCII16 | IS_TEXT_UNICODE_SIGNATURE |
IS_TEXT_UNICODE_ILLEGAL_CHARS

After trying dozens of combinations, that's the one I've found to get the
best overall results.

Yes, that's reasonable. And from the description of IS_TEXT_UNICODE_ASCII16 I'd expect it to catch

Code:

L"a > b < c & d | e ` f %foo %path"

But it doesn't. The only tests which ID that as Unicode are IS_TEXT_UNICODE_STATISTICS, IS_TEXT_UNICODE_CONTROLS, and IS_TEXT_UNICODE_NULL_BYTES. [Hmmm! I just found some articles suggesting is useless/inconsistent.]

Search

Welcome!

Q about QueryIsFileUnicode

vefatica

rconn

Administrator

vefatica

rconn

Administrator

vefatica