How to? How do I read a Unicode file through standard-input?

  • This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn more.
#1
I'm sure I'm missing something here, but I don't quite know what I'm missing or where to find it (and I've spent some time looking). Specifically, I wanted to read a Unicode file through "standard input", i.e. <"A Unicode File.txt". And I tried to read it using the "usual" technique, i.e. "Set Line=%@SafeExp[@Line[CON,0]]". (Does "@SafeExp" have something to do with it?) And the problem is that the program completely fails when the input file is a Unicode file, and works as expected when it is an ANSI (ASCII file). In the short term I fixed the problem by creating an ANSI version of the input file I want to process, but I'd really rather to be able to "natively" read a Unicode file. So how do I do that?

- Dan
 

Charles Dye

Super Moderator
Staff member
May 20, 2008
3,379
39
Albuquerque, NM
prospero.unm.edu
#2
You can try it without the @SAFEEXP.... but I don't think that CON expects to receive anything but 8-bit text. You'll probably have to rewrite your batch file to use @FILEOPEN, @FILECLOSE, and @FILEREAD or @SAFEREAD.
 
#3
Thank you, Charles, that's kind of what I expected although I find it quite surprising because I thought that TCC's native "language" was Unicode and not ANSI/ASCII, which were only there as kind of a "concession". And this is kind of surprising to me because the primary kinds of batch files that I write are "filters" of some kind or another; they read some data in from standard input, process it in some manner, and write it back out to standard output. And it's really not too unusual to for me to have two or three or even more of these "filters" "piped" one after the other on the command line. (And this kind of thing can not be done using temporary files that I have to write and read.) So the inability to read Unicode from standard input basically means that I will have to uncheck the "Unicode Output" option, which I've kept unchecked for probably as long as it's existed but recently changed because I've read a couple of things lately that all basically said "The world is switching to Unicode, and you'll be left behind if you don't switch too." (And I really doubt that "@SafeExp" has anything at all to do with it and I really don't want to live without it - in particular, I make heavy use of very-long files (as a memory aid to tell me exactly what is in a file given my bad memory), and I like to use "&" instead of "and" and I make heavy use of commas (which I have to "UnSafe /E:, >NUL: of course), and because batch files couldn't handle these things before your "@Safe" routines (and, as always, thank you very much!) I was forced to write C++ programs which, because of my drastically declining programming skills due to my bad memory was getting more and more impractical. So, thank you again!)

- Dan
 

Charles Dye

Super Moderator
Staff member
May 20, 2008
3,379
39
Albuquerque, NM
prospero.unm.edu
#4
And this is kind of surprising to me because the primary kinds of batch files that I write are "filters" of some kind or another; they read some data in from standard input, process it in some manner, and write it back out to standard output. And it's really not too unusual to for me to have two or three or even more of these "filters" "piped" one after the other on the command line. (And this kind of thing can not be done using temporary files that I have to write and read.)
Well, perhaps you could add just one more filter to your command line, to translate Unicode to OEM before sending it on to your batch file. If you'd like to try one of mine, Xcode will output 8-bit text if you give it the /A option.

It ought to be possible to add a function to SafeChars to slurp a line from stdin; I'll take a look at it sometime next week.
 
#5
Thank you, Charles, I'll look into XCode. And while adding a function to "slurp lines from stdin" might be a nice addition (only if it is relatively easy for you to do!), I tend to think that, in my circumstances, just unchecking the "Unicode Output" option is a better idea because, as far as I know, I really don't need Unicode anyway.

- Dan