Strange output, here-doc redirection, TYPE, //UnicodeOutput=Yes

  • This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn more.

Charles Dye

Super Moderator
Staff member
May 20, 2008
3,383
39
Albuquerque, NM
prospero.unm.edu
#1
I was attempting to create a Unicode text file using here-document redirection, and wound up with a peculiar output file. This is a simplified version of the script (the original is more complex):

Code:
@echo off
 
setlocal
 
set saveopt=%@option[unicodeoutput]
option //unicodeoutput=yes
 
type >! outfile.txt <<- endtext
   This is only a stupid little test.
endtext
 
option //unicodeoutput=%saveopt
 
endlocal
When I run this, the output file starts off with hexadecimal values:

Code:
ff fe a0 00 a0 25 54 00 00 00 68 00 00 00 69 00 00 00 73 00 00 00 ...
and continues on in that vein. A BOM, four bytes of garbage (always a0 00 a0 25 in my tests), then what looks like my output text but using four-byte characters. Any idea what's going on here?

I see this behavior when TYPE is used with here-doc redirection and UnicodeOutput is on. Turning UnicodeOutput off, TYPEing a normal (Unicode) text file, or using TEXT/ENDTEXT instead of TYPE all produce normal output files as expected. On the other hand, if I remove the output redirection, then TYPE dumps strange output to the screen....

I see this in TCC 11.00.39. The current release of TCC/LE seems to do the same thing.
 
#2
Charles Dye wrote:
| I was attempting to create a Unicode text file using here-document
| redirection, and wound up with a peculiar output file. This is a
| simplified version of the script (the original is more complex):
|
|
| Code:
| ---------
| @echo off
|
| setlocal
|
| set saveopt=%@option[unicodeoutput]
| option //unicodeoutput=yes
|
| type >! outfile.txt <<- endtext
| This is only a stupid little test.
| endtext
|
| option //unicodeoutput=%saveopt
|
| endlocal
| ---------
| When I run this, the output file starts off with hexadecimal values:
|
|
| Code:
| ---------
| ff fe a0 00 a0 25 54 00 00 00 68 00 00 00 69 00 00 00 73 00 00 00 ...
| ---------
| and continues on in that vein. A BOM, four bytes of garbage (always
| a0 00 a0 25 in my tests), then what looks like my output text but
| using four-byte characters. Any idea what's going on here?

Converting ASCII to Unicode (possibly by TYPE), misassuming it is still
ASCII, and converting again (by redirection)?

| I see this behavior when TYPE is used with here-doc redirection and
| UnicodeOutput is on. Turning UnicodeOutput off, TYPEing a normal
| (Unicode) text file, or using TEXT/ENDTEXT instead of TYPE all
| produce normal output files as expected. On the other hand, if I
| remove the output redirection, then TYPE dumps strange output to the
| screen....
|
| I see this in TCC 11.00.39. The current release of TCC/LE seems to
| do the same thing.

Did you mean using TEXT like this:

Code:
---------
@echo off

setlocal

set saveopt=%@option[unicodeoutput]
option //unicodeoutput=yes

text >! outfile.txt
This is only a stupid little test.
endtext

option //unicodeoutput=%saveopt

endlocal
---------

which produces a proper unicode file, with no strange characters.

BTW, why would you want to use "here" redirection from TYPE, instead of
normal redirection from TEXT anyway?
--
Steve
 

Charles Dye

Super Moderator
Staff member
May 20, 2008
3,383
39
Albuquerque, NM
prospero.unm.edu
#3
Did you mean using TEXT like this:

text >! outfile.txt
This is only a stupid little test.
endtext
Yep -- the sensible, logical approach that any normal person would use.

BTW, why would you want to use "here" redirection from TYPE, instead of
normal redirection from TEXT anyway?
Just for pretties. Redirection with <<- has the benefit of stripping off leading spaces, and the text block in the actual script was inside an IFF / ENDIFF block (indented.) The equivalent TEXT / ENDTEXT block needs to be "undented", unless I want to put up with leading spaces in the output file.

(Perhaps some future version might add an option to TEXT to strip leading / trailing spaces.)
 
#4
Charles Dye wrote:
| ---Quote (Originally by Steve Fbin)---
|| BTW, why would you want to use "here" redirection from TYPE, instead
|| of
|| normal redirection from TEXT anyway?
| ---End Quote---
| Just for pretties. Redirection with <<- has the benefit of
| stripping off leading spaces, and the text block in the actual
| script was inside an IFF / ENDIFF block (indented.) The equivalent
| TEXT / ENDTEXT block needs to be "undented", unless I want to put up
| with leading spaces in the output file.

... another reason might be if "here" redirection evaluates variables and
functions. I never actually used it.
|
| (Perhaps some future version might add an option to TEXT to strip
| leading / trailing spaces.)

Hear! Hear! Hear! I hate the need to undent, too... yet it's still good for
table headers/footers, etc.
--
Steve
 

rconn

Administrator
Staff member
May 14, 2008
10,100
85
#6
> I was attempting to create a Unicode text file using here-document
> redirection, and wound up with a peculiar output file. This is a
> simplified version of the script (the original is more complex):
>
> When I run this, the output file starts off with hexadecimal values:
>
> Code:
> ---------
> ff fe a0 00 a0 25 54 00 00 00 68 00 00 00 69 00 00 00 73 00 00 00 ...
> ---------
> and continues on in that vein. A BOM, four bytes of garbage (always a0
> 00 a0 25 in my tests), then what looks like my output text but using
> four-byte characters. Any idea what's going on here?
What's happening is that you're writing the outputfile as Unicode, and then
feeding it to TYPE through CON:, which does *not* expect to see a Unicode
header coming through the keyboard. The ff fe then gets translated as ASCII
characters to Unicode characters, and you end up with a puzzling result.

I'm not sure this is worth changing, and I'm not sure that I'd *want* to
change it. I'll ponder this for a bit; meanwhile, there are several other
more rational ways of doing the same thing. :-)
 

Charles Dye

Super Moderator
Staff member
May 20, 2008
3,383
39
Albuquerque, NM
prospero.unm.edu
#7
I'm not sure this is worth changing, and I'm not sure that I'd *want* to change it. I'll ponder this for a bit; meanwhile, there are several other more rational ways of doing the same thing. :-)
He that perpetrateth a kludge which dependeth upon undocumented features, verily he deserveth whatever he getteth....

No big deal. My output file doesn't even really need to be Unicode, Microsoft's documentation notwithstanding; and as you say there are better ways of doing it. I posted mainly because my results puzzled me.