LIST and TYPE show UTF8 BOM

vefatica · Aug 29, 2012

[Inspired by another thread] It's no big deal to me but ... LIST shows the 3-byte UTF8 BOM (whereas it doesn't show a Unicode BOM). Ditto for TYPE. VIEW does not show the BOM in either case.

Howard Goldstein · Aug 29, 2012

vefatica said:
[Inspired by another thread] It's no big deal to me but ... LIST shows the 3-byte UTF8 BOM (whereas it doesn't show a Unicode BOM). Ditto for TYPE. VIEW does not show the BOM in either case.

In LIST you can use the /8 switch to display the file correctly. I don't see a way to do that with TYPE though.

--
Howard

rconn · Aug 29, 2012

vefatica said:
[Inspired by another thread] It's no big deal to me but ... LIST shows the 3-byte UTF8 BOM (whereas it doesn't show a Unicode BOM). Ditto for TYPE. VIEW does not show the BOM in either case.

Windows doesn't support UTF8 in any meaningful way (and UTF8 BOMs are deprecated at best and actively discouraged in practice). Try it with CMD -- it cannot handle a UTF8 file at all.

LIST and VIEW have the ability to display UTF8 files, but since you cannot actually *do* anything with them in Windows there isn't any point in extending it further.

myarmor · Aug 30, 2012

I'd say UTF8 BOM isn't deprecated, but it might be discouraged in certain filetypes, but then in exchange for other ways of telling the same (such as in xml and html)..
The reason is that without the BOM (or similar things such as html charset and xml encoding etc) there is no way to determine 100% accurate whether the file
uses utf8 or an ansi codepage (the latter is often still the case).

Do with them? as in provided by the OS?
Maybe not, but then for most people the OS just sits there, the actual programs people tend to use is more than those. and those often do support utf8 (apparently more often
than not nowadays)

Charles Dye · Aug 30, 2012

rconn said:
Windows doesn't support UTF8 in any meaningful way (and UTF8 BOMs are deprecated at best and actively discouraged in practice).

I've been reading the Unicode standard, and I don't see that UTF-8 BOMs are deprecated, required, forbidden, discouraged, or encouraged. You can use a BOM at the start of a UTF-8 file or not; they're valid but not mandatory. Chapter 16.8 of the Standard says:

In UTF-8, the BOM corresponds to the byte sequence <EF BB BF>. Although there are never any questions of byte order with UTF-8 text, this sequence can serve as signature for UTF-8 encoded text where the character set is unmarked.

It seems to me that LIST /8 already does the right thing -- ignores the initial BOM. It would be nice if LIST checked for a UTF-8 BOM at the start of a file and switched to UTF-8 mode automatically. I know that you're tired of LIST and would rather leave further development to VIEW, but detecting a three-byte signature shouldn't be terribly difficult....

Search

Welcome!

LIST and TYPE show UTF8 BOM

vefatica

Howard Goldstein

rconn

Administrator

myarmor

Charles Dye

Super Moderator

Similar threads