1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

LIST and TYPE show UTF8 BOM

Discussion in 'Support' started by vefatica, Aug 29, 2012.

  1. vefatica

    Joined:
    May 20, 2008
    Messages:
    7,792
    Likes Received:
    29
    [Inspired by another thread] It's no big deal to me but ... LIST shows the 3-byte UTF8 BOM (whereas it doesn't show a Unicode BOM). Ditto for TYPE. VIEW does not show the BOM in either case.
     
  2. Howard Goldstein

    Joined:
    Jun 1, 2008
    Messages:
    111
    Likes Received:
    1
    In LIST you can use the /8 switch to display the file correctly. I don't see a way to do that with TYPE though.

    --
    Howard
     
  3. rconn

    rconn Administrator
    Staff Member

    Joined:
    May 14, 2008
    Messages:
    9,732
    Likes Received:
    81
    Windows doesn't support UTF8 in any meaningful way (and UTF8 BOMs are deprecated at best and actively discouraged in practice). Try it with CMD -- it cannot handle a UTF8 file at all.

    LIST and VIEW have the ability to display UTF8 files, but since you cannot actually *do* anything with them in Windows there isn't any point in extending it further.
     
  4. myarmor

    Joined:
    May 30, 2008
    Messages:
    65
    Likes Received:
    1
    I'd say UTF8 BOM isn't deprecated, but it might be discouraged in certain filetypes, but then in exchange for other ways of telling the same (such as in xml and html)..
    The reason is that without the BOM (or similar things such as html charset and xml encoding etc) there is no way to determine 100% accurate whether the file
    uses utf8 or an ansi codepage (the latter is often still the case).

    Do with them? as in provided by the OS?
    Maybe not, but then for most people the OS just sits there, the actual programs people tend to use is more than those. and those often do support utf8 (apparently more often
    than not nowadays)
     
  5. Charles Dye

    Charles Dye Super Moderator
    Staff Member

    Joined:
    May 20, 2008
    Messages:
    3,280
    Likes Received:
    38
    I've been reading the Unicode standard, and I don't see that UTF-8 BOMs are deprecated, required, forbidden, discouraged, or encouraged. You can use a BOM at the start of a UTF-8 file or not; they're valid but not mandatory. Chapter 16.8 of the Standard says:

    It seems to me that LIST /8 already does the right thing -- ignores the initial BOM. It would be nice if LIST checked for a UTF-8 BOM at the start of a file and switched to UTF-8 mode automatically. I know that you're tired of LIST and would rather leave further development to VIEW, but detecting a three-byte signature shouldn't be terribly difficult....
     

Share This Page