1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Q about QueryIsFileUnicode

Discussion in 'Plugins' started by vefatica, Sep 26, 2009.

  1. vefatica

    Joined:
    May 20, 2008
    Messages:
    7,794
    Likes Received:
    29
    When QueryIsFileUnicode is used on a disk file and a BOM is found, the function seems to leave the file pointer positioned after the BOM. Is that correct?

    Besides looking for a BOM, what else does QueryIsFileUnicode do? I ask because it fails to identify a file containing only this line (below), woth no BOM, as Unicode.

    Code:
    0000 0000 61 00 20 00 3e 00 20 00  62 00 20 00 3c 00 20 00  a. .>. .b. .<. .
    0000 0010 63 00 20 00 26 00 20 00  64 00 20 00 7c 00 20 00  c. .&. .d. .|. .
    0000 0020 65 00 20 00 60 00 20 00  66 00 20 00 25 00 66 00  e. .`. .f. .%.f.
    0000 0030 6f 00 6f 00 20 00 25 00  70 00 61 00 74 00 68 00  o.o. .%.p.a.t.h.
    0000 0040 0d 00 0a 00                                       ....
     
  2. rconn

    rconn Administrator
    Staff Member

    Joined:
    May 14, 2008
    Messages:
    9,732
    Likes Received:
    81
    Yes.


    QueryIsFileUnicode does not look for a BOM; it just skips it if the text is
    declared (by Windows) to be Unicode. It calls the Windows API
    IsTextUnicode; if you have a problem with that API you should ask Microsoft
    for details.

    Rex Conn
    JP Software
     
  3. vefatica

    Joined:
    May 20, 2008
    Messages:
    7,794
    Likes Received:
    29
    What tests do you ask IsTextUnicode to do?
     
  4. rconn

    rconn Administrator
    Staff Member

    Joined:
    May 14, 2008
    Messages:
    9,732
    Likes Received:
    81
    IS_TEXT_UNICODE_ASCII16 | IS_TEXT_UNICODE_SIGNATURE |
    IS_TEXT_UNICODE_ILLEGAL_CHARS

    After trying dozens of combinations, that's the one I've found to get the
    best overall results.

    Rex Conn
    JP Software
     
  5. vefatica

    Joined:
    May 20, 2008
    Messages:
    7,794
    Likes Received:
    29
    Yes, that's reasonable. And from the description of IS_TEXT_UNICODE_ASCII16 I'd expect it to catch
    Code:
    L"a > b < c & d | e ` f %foo %path"
    
    But it doesn't. The only tests which ID that as Unicode are IS_TEXT_UNICODE_STATISTICS, IS_TEXT_UNICODE_CONTROLS, and IS_TEXT_UNICODE_NULL_BYTES. [Hmmm! I just found some articles suggesting is useless/inconsistent.]
     

Share This Page