TCC Unicode support?

May 30, 2008
65
1
#1
Does TCC (commandline) support unicode .btm files?

If so, does it require the BOM, or can it determine encoding on it's own?
Are there any other encodings it supports (utf8 etc)?

I'm not sure if CMD.EXE (in windows 7) supports it though as I don't dare to try as executing random symbols (which it would be without unicode support) doesn't sound too good.
 

rconn

Administrator
Staff member
May 14, 2008
10,493
94
#2
> Does TCC (commandline) support unicode .btm files?
Yes.


> If so, does it require the BOM, or can it determine encoding on it's own?
> Are there any other encodings it supports (utf8 etc)?
The BOM mark is helpful (and faster), but provided it's more than a few
bytes long TCC can determine its type.

TCC does not support UTF-8 batch files (and I cannot think of a reason why
you would want to use them!)

Rex Conn
JP Software
 
#3
On Sat, 24 Apr 2010 06:54:28 -0400, myarmor <> wrote:

|Does TCC (commandline) support unicode .btm files?
|
|If so, does it require the BOM, or can it determine encoding on it's own?
|Are there any other encodings it supports (utf8 etc)?

It's OK with a BOM.

It apparently doesn't work without a BOM.

v:\> ver

TCC 11.00.48 Windows XP [Version 5.1.2600]

v:\> type /x ucode.bat
0000 0000 65 00 63 00 68 00 6f 00 20 00 66 00 6f 00 6f 00 e.c.h.o. .f.o.o.
0000 0010 0d 00 0a 00 ....

v:\> ucode.bat
TCC: V:\ucode.bat [1] Unknown command "e"

(XP's) CMD cannot deal with it, with or without a BOM:

Microsoft Windows XP [Version 5.1.2600]
(C) Copyright 1985-2001 Microsoft Corp.

v:\> ucode.bat (with BOM)

v:\> ÿþe
'ÿþe' is not recognized as an internal or external command,
operable program or batch file.

v:\> ucode.bat (without BOM)

v:\> e
'e' is not recognized as an internal or external command,
operable program or batch file.
--
- Vince
 
May 30, 2008
65
1
#4
Thanks for the info..
With BOM it works as expected.

vefatica seems to be right though.. it apparently is somewhat bad at handling files without BOM (UTF-16 LE to be exact).
I tested with two ECHO lines, first with european, second with japanese in a unicodefile without BOM.

If the first char in the unicodefile is @ though, it doesn't complain nor run.

However, now when it's known, I know what to do :)

Finding a font which supports unicode in console, thats another matter (W7 Pro only lists Consolas, Lucida Console and Raster Fonts, and none of them seems to have much support for it).
 

rconn

Administrator
Staff member
May 14, 2008
10,493
94
#5
> |Does TCC (commandline) support unicode .btm files?
> |
> |If so, does it require the BOM, or can it determine encoding on it's own?
> |Are there any other encodings it supports (utf8 etc)?
>
> It's OK with a BOM.
>
> It apparently doesn't work without a BOM.
>
> v:\> ver
>
> TCC 11.00.48 Windows XP [Version 5.1.2600]
>
> v:\> type /x ucode.bat
> 0000 0000 65 00 63 00 68 00 6f 00 20 00 66 00 6f 00 6f 00 e.c.h.o.
> .f.o.o.
> 0000 0010 0d 00 0a 00 ....
We've had this discussion before - Windows (not TCC) needs more than 10
bytes to determine whether a string is Unicode. If you want to write really
small Unicode batch files, you're going to have to insert the BOM.

Rex Conn
JP Software
 
May 30, 2008
65
1
#6
I forgot to mention that this was ran on Windows 7 x64 Pro and newest version/update of Take Command (I tend to use only TCC of that package).

My test contained:
@echo off
echo This is a test of a .btm file without BOM
echo (15 japanese characters goes here, I don't include them in this post).

In other words, it was a bit more than 10 chars, and over 3 lines in total.
I'm not saying it is your fault or anything as you use the windows api's to determine it, I'm just mentioning it.

However, as long as it's knows it doesn't really bother me that much..
 
#7
On Sun, 25 Apr 2010 12:42:24 -0400, myarmor <> wrote:

|My test contained:
|@echo off
|echo This is a test of a .btm file without BOM
|echo (15 japanese characters goes here, I don't include them in this post).
|
|In other words, it was a bit more than 10 chars, and over 3 lines in total.
|I'm not saying it is your fault or anything as you use the windows api's to determine it, I'm just mentioning it.

I'd recommend to Rex using IS_TEXT_UNICODE_STATISTICS in addition to the current
tests. It works better for me (recognizes short WCHAR strings, including L"echo
foo").
--
- Vince