Strange output, here-doc redirection, TYPE, //UnicodeOutput=Yes

Charles Dye

Super Moderator
Staff member
May 20, 2008
4,495
91
Albuquerque, NM
prospero.unm.edu
I was attempting to create a Unicode text file using here-document redirection, and wound up with a peculiar output file. This is a simplified version of the script (the original is more complex):

Code:
@echo off
 
setlocal
 
set saveopt=%@option[unicodeoutput]
option //unicodeoutput=yes
 
type >! outfile.txt <<- endtext
   This is only a stupid little test.
endtext
 
option //unicodeoutput=%saveopt
 
endlocal
When I run this, the output file starts off with hexadecimal values:

Code:
ff fe a0 00 a0 25 54 00 00 00 68 00 00 00 69 00 00 00 73 00 00 00 ...
and continues on in that vein. A BOM, four bytes of garbage (always a0 00 a0 25 in my tests), then what looks like my output text but using four-byte characters. Any idea what's going on here?

I see this behavior when TYPE is used with here-doc redirection and UnicodeOutput is on. Turning UnicodeOutput off, TYPEing a normal (Unicode) text file, or using TEXT/ENDTEXT instead of TYPE all produce normal output files as expected. On the other hand, if I remove the output redirection, then TYPE dumps strange output to the screen....

I see this in TCC 11.00.39. The current release of TCC/LE seems to do the same thing.
 
May 20, 2008
3,515
4
Elkridge, MD, USA
Charles Dye wrote:
| I was attempting to create a Unicode text file using here-document
| redirection, and wound up with a peculiar output file. This is a
| simplified version of the script (the original is more complex):
|
|
| Code:
| ---------
| @echo off
|
| setlocal
|
| set saveopt=%@option[unicodeoutput]
| option //unicodeoutput=yes
|
| type >! outfile.txt <<- endtext
| This is only a stupid little test.
| endtext
|
| option //unicodeoutput=%saveopt
|
| endlocal
| ---------
| When I run this, the output file starts off with hexadecimal values:
|
|
| Code:
| ---------
| ff fe a0 00 a0 25 54 00 00 00 68 00 00 00 69 00 00 00 73 00 00 00 ...
| ---------
| and continues on in that vein. A BOM, four bytes of garbage (always
| a0 00 a0 25 in my tests), then what looks like my output text but
| using four-byte characters. Any idea what's going on here?

Converting ASCII to Unicode (possibly by TYPE), misassuming it is still
ASCII, and converting again (by redirection)?

| I see this behavior when TYPE is used with here-doc redirection and
| UnicodeOutput is on. Turning UnicodeOutput off, TYPEing a normal
| (Unicode) text file, or using TEXT/ENDTEXT instead of TYPE all
| produce normal output files as expected. On the other hand, if I
| remove the output redirection, then TYPE dumps strange output to the
| screen....
|
| I see this in TCC 11.00.39. The current release of TCC/LE seems to
| do the same thing.

Did you mean using TEXT like this:

Code:
---------
@echo off

setlocal

set saveopt=%@option[unicodeoutput]
option //unicodeoutput=yes

text >! outfile.txt
This is only a stupid little test.
endtext

option //unicodeoutput=%saveopt

endlocal
---------

which produces a proper unicode file, with no strange characters.

BTW, why would you want to use "here" redirection from TYPE, instead of
normal redirection from TEXT anyway?
--
Steve
 

Charles Dye

Super Moderator
Staff member
May 20, 2008
4,495
91
Albuquerque, NM
prospero.unm.edu
Did you mean using TEXT like this:

text >! outfile.txt
This is only a stupid little test.
endtext

Yep -- the sensible, logical approach that any normal person would use.

BTW, why would you want to use "here" redirection from TYPE, instead of
normal redirection from TEXT anyway?

Just for pretties. Redirection with <<- has the benefit of stripping off leading spaces, and the text block in the actual script was inside an IFF / ENDIFF block (indented.) The equivalent TEXT / ENDTEXT block needs to be "undented", unless I want to put up with leading spaces in the output file.

(Perhaps some future version might add an option to TEXT to strip leading / trailing spaces.)
 
May 20, 2008
3,515
4
Elkridge, MD, USA
Charles Dye wrote:
| ---Quote (Originally by Steve Fbin)---
|| BTW, why would you want to use "here" redirection from TYPE, instead
|| of
|| normal redirection from TEXT anyway?
| ---End Quote---
| Just for pretties. Redirection with <<- has the benefit of
| stripping off leading spaces, and the text block in the actual
| script was inside an IFF / ENDIFF block (indented.) The equivalent
| TEXT / ENDTEXT block needs to be "undented", unless I want to put up
| with leading spaces in the output file.

... another reason might be if "here" redirection evaluates variables and
functions. I never actually used it.
|
| (Perhaps some future version might add an option to TEXT to strip
| leading / trailing spaces.)

Hear! Hear! Hear! I hate the need to undent, too... yet it's still good for
table headers/footers, etc.
--
Steve
 

rconn

Administrator
Staff member
May 14, 2008
12,404
152
> I was attempting to create a Unicode text file using here-document
> redirection, and wound up with a peculiar output file. This is a
> simplified version of the script (the original is more complex):
>
> When I run this, the output file starts off with hexadecimal values:
>
> Code:
> ---------
> ff fe a0 00 a0 25 54 00 00 00 68 00 00 00 69 00 00 00 73 00 00 00 ...
> ---------
> and continues on in that vein. A BOM, four bytes of garbage (always a0
> 00 a0 25 in my tests), then what looks like my output text but using
> four-byte characters. Any idea what's going on here?

What's happening is that you're writing the outputfile as Unicode, and then
feeding it to TYPE through CON:, which does *not* expect to see a Unicode
header coming through the keyboard. The ff fe then gets translated as ASCII
characters to Unicode characters, and you end up with a puzzling result.

I'm not sure this is worth changing, and I'm not sure that I'd *want* to
change it. I'll ponder this for a bit; meanwhile, there are several other
more rational ways of doing the same thing. :-)
 

Charles Dye

Super Moderator
Staff member
May 20, 2008
4,495
91
Albuquerque, NM
prospero.unm.edu
I'm not sure this is worth changing, and I'm not sure that I'd *want* to change it. I'll ponder this for a bit; meanwhile, there are several other more rational ways of doing the same thing. :-)

He that perpetrateth a kludge which dependeth upon undocumented features, verily he deserveth whatever he getteth....

No big deal. My output file doesn't even really need to be Unicode, Microsoft's documentation notwithstanding; and as you say there are better ways of doing it. I posted mainly because my results puzzled me.
 
Similar threads
Thread starter Title Forum Replies Date
Steve Pitts WAD Strange output from DEL of a non-existent directory Support 7
R WAD Strange output from "memory" command Support 1
R strange bug? Support 7
Jesse Heines Strange Line Wrapping Behavior Support 14
F strange results Support 9
M Strange error messages from TCC in FTP copy Support 7
M Another possibly strange remote registry issue Support 5
forbin Strange handling of [nonbright] magenta background (v22) Support 2
N Fixed Strange dir behavior Support 6
vefatica REGDIR, strange error message Support 7
T WAD Strange Unexpected "features" in the Debugger Support 2
P Strange mouse behavior with list Support 2
vefatica Strange tcc.exception.log Support 7
vefatica A strange one Support 0
D Strange DO behavior with /O Support 5
Glenn Bowes Strange text at startup Support 5
vefatica Big numbers, strange errors Support 1
aedthuio Strange... lpksetup Support 4
CWBillow dir /4 strange Support 2
D Strange issue with FOR loop Support 15
MikeBaas Strange prob with %@replace.. Support 4
vefatica OT: strange files in %TEMP Support 10
Dan Glynhampton Documentation v15 help: Strange links in @INT topic Support 0
M Yet another strange something re something called "@TCONVERT" Support 8
Roedy How to? Strange colours Support 9
M WAD Strange "Start" misbehavior... Support 10
vefatica Very strange console font corruption Support 3
Steve Pitts Strange problem with FREE Support 10
A strange error in alias Support 9
newgeekorder Debugger IDE - strange tab and parameter behaviour Support 1
Exolon Strange Prompt. Support 6
vefatica Strange folders Support 1
T Strange CPU value Support 3
J Strange error: unset /s Support 14
M Strange behavior... Support 2
CWBillow Strange happenings Support 2
B Strange handling of a .BAT file Support 5
vefatica Strange behavior reloading SHRALIAS sav files. Support 1
J ASSOC / FTYPE strange error message Support 3
D Strange crashes in @CRC32 and @MD5 Support 9
S Strange CHKDSK behavior Support 6
vefatica Strange results with CP 1252 Support 12
S Strange REN problem - non-English characters Support 3
dcantor Strange status in ACTIVATE command Support 0
A How to? TEE - duplicate output to STDERR Support 3
FreezerBurnt ANSI Prompt messes up list output. Support 39
Ó Recent git output not shown on TCC Support 1
Peter Murschall TEE cannot handle Unicode output Support 2
C TEE command appending null characters to output Support 6
Jesse Heines webform output different in ver. 26 than ver. 25 Support 2

Similar threads