1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Strange output, here-doc redirection, TYPE, //UnicodeOutput=Yes

Discussion in 'Support' started by Charles Dye, Feb 3, 2010.

  1. Charles Dye

    Charles Dye Super Moderator
    Staff Member

    Joined:
    May 20, 2008
    Messages:
    3,300
    Likes Received:
    39
    I was attempting to create a Unicode text file using here-document redirection, and wound up with a peculiar output file. This is a simplified version of the script (the original is more complex):

    Code:
    @echo off
     
    setlocal
     
    set saveopt=%@option[unicodeoutput]
    option //unicodeoutput=yes
     
    type >! outfile.txt <<- endtext
       This is only a stupid little test.
    endtext
     
    option //unicodeoutput=%saveopt
     
    endlocal
     
    
    When I run this, the output file starts off with hexadecimal values:

    Code:
    ff fe a0 00 a0 25 54 00 00 00 68 00 00 00 69 00 00 00 73 00 00 00 ...
    
    and continues on in that vein. A BOM, four bytes of garbage (always a0 00 a0 25 in my tests), then what looks like my output text but using four-byte characters. Any idea what's going on here?

    I see this behavior when TYPE is used with here-doc redirection and UnicodeOutput is on. Turning UnicodeOutput off, TYPEing a normal (Unicode) text file, or using TEXT/ENDTEXT instead of TYPE all produce normal output files as expected. On the other hand, if I remove the output redirection, then TYPE dumps strange output to the screen....

    I see this in TCC 11.00.39. The current release of TCC/LE seems to do the same thing.
     
  2. Steve Fabian

    Joined:
    May 20, 2008
    Messages:
    3,520
    Likes Received:
    4
    Charles Dye wrote:
    | I was attempting to create a Unicode text file using here-document
    | redirection, and wound up with a peculiar output file. This is a
    | simplified version of the script (the original is more complex):
    |
    |
    | Code:
    | ---------
    | @echo off
    |
    | setlocal
    |
    | set saveopt=%@option[unicodeoutput]
    | option //unicodeoutput=yes
    |
    | type >! outfile.txt <<- endtext
    | This is only a stupid little test.
    | endtext
    |
    | option //unicodeoutput=%saveopt
    |
    | endlocal
    | ---------
    | When I run this, the output file starts off with hexadecimal values:
    |
    |
    | Code:
    | ---------
    | ff fe a0 00 a0 25 54 00 00 00 68 00 00 00 69 00 00 00 73 00 00 00 ...
    | ---------
    | and continues on in that vein. A BOM, four bytes of garbage (always
    | a0 00 a0 25 in my tests), then what looks like my output text but
    | using four-byte characters. Any idea what's going on here?

    Converting ASCII to Unicode (possibly by TYPE), misassuming it is still
    ASCII, and converting again (by redirection)?

    | I see this behavior when TYPE is used with here-doc redirection and
    | UnicodeOutput is on. Turning UnicodeOutput off, TYPEing a normal
    | (Unicode) text file, or using TEXT/ENDTEXT instead of TYPE all
    | produce normal output files as expected. On the other hand, if I
    | remove the output redirection, then TYPE dumps strange output to the
    | screen....
    |
    | I see this in TCC 11.00.39. The current release of TCC/LE seems to
    | do the same thing.

    Did you mean using TEXT like this:

    Code:
    ---------
    @echo off

    setlocal

    set saveopt=%@option[unicodeoutput]
    option //unicodeoutput=yes

    text >! outfile.txt
    This is only a stupid little test.
    endtext

    option //unicodeoutput=%saveopt

    endlocal
    ---------

    which produces a proper unicode file, with no strange characters.

    BTW, why would you want to use "here" redirection from TYPE, instead of
    normal redirection from TEXT anyway?
    --
    Steve
     
  3. Charles Dye

    Charles Dye Super Moderator
    Staff Member

    Joined:
    May 20, 2008
    Messages:
    3,300
    Likes Received:
    39
    Yep -- the sensible, logical approach that any normal person would use.

    Just for pretties. Redirection with <<- has the benefit of stripping off leading spaces, and the text block in the actual script was inside an IFF / ENDIFF block (indented.) The equivalent TEXT / ENDTEXT block needs to be "undented", unless I want to put up with leading spaces in the output file.

    (Perhaps some future version might add an option to TEXT to strip leading / trailing spaces.)
     
  4. Steve Fabian

    Joined:
    May 20, 2008
    Messages:
    3,520
    Likes Received:
    4
    Charles Dye wrote:
    | ---Quote (Originally by Steve Fbin)---
    || BTW, why would you want to use "here" redirection from TYPE, instead
    || of
    || normal redirection from TEXT anyway?
    | ---End Quote---
    | Just for pretties. Redirection with <<- has the benefit of
    | stripping off leading spaces, and the text block in the actual
    | script was inside an IFF / ENDIFF block (indented.) The equivalent
    | TEXT / ENDTEXT block needs to be "undented", unless I want to put up
    | with leading spaces in the output file.

    ... another reason might be if "here" redirection evaluates variables and
    functions. I never actually used it.
    |
    | (Perhaps some future version might add an option to TEXT to strip
    | leading / trailing spaces.)

    Hear! Hear! Hear! I hate the need to undent, too... yet it's still good for
    table headers/footers, etc.
    --
    Steve
     
  5. Charles Dye

    Charles Dye Super Moderator
    Staff Member

    Joined:
    May 20, 2008
    Messages:
    3,300
    Likes Received:
    39
    Better yet -- it's optional! Variables, functions etc. are expanded unless the word after the redirection operator is in double quotes.
     
  6. rconn

    rconn Administrator
    Staff Member

    Joined:
    May 14, 2008
    Messages:
    9,854
    Likes Received:
    83
    What's happening is that you're writing the outputfile as Unicode, and then
    feeding it to TYPE through CON:, which does *not* expect to see a Unicode
    header coming through the keyboard. The ff fe then gets translated as ASCII
    characters to Unicode characters, and you end up with a puzzling result.

    I'm not sure this is worth changing, and I'm not sure that I'd *want* to
    change it. I'll ponder this for a bit; meanwhile, there are several other
    more rational ways of doing the same thing. :-)
     
  7. Charles Dye

    Charles Dye Super Moderator
    Staff Member

    Joined:
    May 20, 2008
    Messages:
    3,300
    Likes Received:
    39
    He that perpetrateth a kludge which dependeth upon undocumented features, verily he deserveth whatever he getteth....

    No big deal. My output file doesn't even really need to be Unicode, Microsoft's documentation notwithstanding; and as you say there are better ways of doing it. I posted mainly because my results puzzled me.
     

Share This Page