LIST and TYPE show UTF8 BOM

#1
[Inspired by another thread] It's no big deal to me but ... LIST shows the 3-byte UTF8 BOM (whereas it doesn't show a Unicode BOM). Ditto for TYPE. VIEW does not show the BOM in either case.
 

rconn

Administrator
Staff member
May 14, 2008
10,430
95
#3
[Inspired by another thread] It's no big deal to me but ... LIST shows the 3-byte UTF8 BOM (whereas it doesn't show a Unicode BOM). Ditto for TYPE. VIEW does not show the BOM in either case.
Windows doesn't support UTF8 in any meaningful way (and UTF8 BOMs are deprecated at best and actively discouraged in practice). Try it with CMD -- it cannot handle a UTF8 file at all.

LIST and VIEW have the ability to display UTF8 files, but since you cannot actually *do* anything with them in Windows there isn't any point in extending it further.
 
May 30, 2008
65
1
#4
I'd say UTF8 BOM isn't deprecated, but it might be discouraged in certain filetypes, but then in exchange for other ways of telling the same (such as in xml and html)..
The reason is that without the BOM (or similar things such as html charset and xml encoding etc) there is no way to determine 100% accurate whether the file
uses utf8 or an ansi codepage (the latter is often still the case).

Do with them? as in provided by the OS?
Maybe not, but then for most people the OS just sits there, the actual programs people tend to use is more than those. and those often do support utf8 (apparently more often
than not nowadays)
 

Charles Dye

Super Moderator
Staff member
May 20, 2008
3,556
46
Albuquerque, NM
prospero.unm.edu
#5
Windows doesn't support UTF8 in any meaningful way (and UTF8 BOMs are deprecated at best and actively discouraged in practice).
I've been reading the Unicode standard, and I don't see that UTF-8 BOMs are deprecated, required, forbidden, discouraged, or encouraged. You can use a BOM at the start of a UTF-8 file or not; they're valid but not mandatory. Chapter 16.8 of the Standard says:

In UTF-8, the BOM corresponds to the byte sequence <EF BB BF>. Although there are never any questions of byte order with UTF-8 text, this sequence can serve as signature for UTF-8 encoded text where the character set is unmarked.
It seems to me that LIST /8 already does the right thing -- ignores the initial BOM. It would be nice if LIST checked for a UTF-8 BOM at the start of a file and switched to UTF-8 mode automatically. I know that you're tired of LIST and would rather leave further development to VIEW, but detecting a three-byte signature shouldn't be terribly difficult....