WAD BOM printed on command line from BAT file

  • This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn more.
Feb 23, 2012
238
3
#1
This is really more of a praise-for-TCC post than a bug report, but there is one small item to be addressed, as follows.
I often run .BAT files formatted in UTF-8. In general, my UTF-8 files have a BOM at the top. However, cmd.exe and powershell both balk at such files. Upon encountering the BOM, they throw out the whole first line of the .BAT file as unintelligible (even in cp 65001!). I thus find myself often going back into my editor and manually removing the BOM from these files in order to run them.
I was extremely pleased to find that TCC does not balk at these files at all, and does indeed execute all commands within the batch file as intended.
However, when running the batch file, TCC does still output a few garbage characters to the screen, corresponding to the characters that comprise the BOM at the start of the file. I believe that this is a bug; the BOM is meant only to indicate the format of the file, and should not be output to the screen as if it containing characters to be processed.
 
#2
The BOM in a UTF-8-encoded file shouldn't be there in the first place.

Still, it's a ubiquitous bad practice in Windows apps (originated by Microsoft), so I concur with the opinion that robust applications should expect an extraneous BOM and discard it. Perhaps the TCMD debugger could even raise a flag about that...
 

rconn

Administrator
Staff member
May 14, 2008
10,101
85
#3
I often run .BAT files formatted in UTF-8. In general, my UTF-8 files have a BOM at the top.
This is a feature request, not a bug report.

TCC only has limited UTF-8 support (albeit more than CMD or PowerShell). Because Windows has almost no internal UTF-8 support, it has to be coded individually for each command / usage. Over the past 15 years, we've had exactly one request for UTF-8 batch files (yours). It will require a LOT of extra coding (at least several hundred lines of code scattered through several dozen source modules), so you're not likely to see it very soon (unless a few hundred other users chime in and start clamoring for it too!).

I can probably at least filter the BOM marker, but actually converting the UTF-8 file to the internal UTF-16 representation won't happen immediately.
 
Feb 23, 2012
238
3
#5
Hi Rex,
Here's a batch file that executes the command "dir", with a UTF-8 BOM at its start.
I appreciate your swift responses and solutions, as well as your explanation about the difficulties of adding UTF-8 support throughout.
- Avi
 

Attachments

Feb 23, 2012
238
3
#6
Hi Rex,
Thank you for the explanation. I tried now running my batch file in UTF-16 (with BOM) and found that it runs perfectly in TCC. What I'm slowly learning from our exchanges here is that if I just go with UTF-16 instead of UTF-8, all will be well in TCC. This is great!
You seem surprised that I use UTF-8 files for my multi-language work, and, as you've noted that "Windows has almost no internal UTF-8 support". Indeed, from a programming perspective, for instance, the MS Visual suite works almost exclusively in wchars which hold a UTF-16 representation. However, on the flipside, when it comes to text files, my experience has been that most of the programs that I use output UTF-8 formatted files. Indeed, in the MS .NET libraries, the default setting for the streamwriter CLR class is to write files with a UTF-8 representation.
Similarly, when multi-language email messages arrive they are almost always in UTF-8. And when I export a list of files from voidtool's everything, it arrives in UTF-8.
Because of this situation, whenever I work with multilanguage files, my format of choice is UTF-8. And so, as I've been learning TCC, I've been trying each feature with UTF-8. On the other hand, if I can accomplish my objectives more consistently in TCC by using UTF-16 instead, I'll be happy to do so - I'll simply convert any given file to UTF-16 before processing it. I do hope that one day TCC will add full UTF-8 support across the board, so that any file input will be converted on the fly from UTF-8 to the internal UTF-16 as needed; however, as long as UTF-16 files are supported, I'll refrain from pushing the UTF-8 issue further.
- Avi

TCC only has limited UTF-8 support (albeit more than CMD or PowerShell). Because Windows has almost no internal UTF-8 support, it has to be coded individually for each command / usage... I can probably at least filter the BOM marker, but actually converting the UTF-8 file to the internal UTF-16 representation won't happen immediately.