• This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn more.

UNICODE mixed with ANSI Code

#1
Some of my TCC log files (in Unicode format) sometimes contain sequences in
ANSI code (there are reasons for this :artist:)
Question: Is there a possibility to transfer these files completely into Unicode with TCC?
I know, TPIPE is a monster tool, but at the moment I have the impression that it is not
suitable for this. So mayby a BTM can do the job better.
 
#3
You have 8-bit characters embedded in otherwise UTF-16 text files?
Yes, exactly. Here is (a fragment of) such a problem file.
Btw. FILEREAD may have a problem with this thing that I can't explain ....
 

Attachments

#4
That file is a mess. Some of the 8-bit stuff is syntax messages (which you wouldn't expect to get into a log file).
upload_2017-12-6_14-11-35.png

And the times don't always go in order. For example, there are entries from [24.10.17 15:57:09] which are before entries from [25.10.17 09:21:41].

There are also many "Param/Comment" entries referring to MetaPad sessions. What are they?

Do you have another program also logging to that file?
 
#5
I don't know what's expected here, but I can mess up a Unicode text file easily with >>.
Code:
v:\> notepad uctest.txt
upload_2017-12-6_14-41-19.png
Code:
v:\> type uctest.txt
My dog has fleas.
v:\> echo foo >> uctest.txt
upload_2017-12-6_14-42-43.png

Now (this is the best part).
upload_2017-12-6_14-43-48.png
 
#6
There are also many "Param/Comment" entries referring to MetaPad sessions. What are they?

Do you have another program also logging to that file?
It's as part of my real JPSOFT-History log. In the dark age of Beginnig of 4DOS/4OS/4NT I've worked a long time
with the Log-Feature of it, so the weird "Param/Comment"-Entries result from this time :blackalien:

Only TCC is the logger, but sometimes I must switch to ANSI, and I do it with OPTION //UNICODEOUTPUT=NO.

One reason is for example DISKPART. I've written a BTM to control it in the way
"DISKPART /s Commandscript.txt"
but DISKPART cannot handle UNICODE in "Commandscript.txt", so it must be ANSI.
And as I look in the last month into Log, I see the desaster mix of Codes.
It's not a big problem, but I'm looking for a way to clean the file, so I've had the hope, that I can do it with TCC.
 

rconn

Administrator
Staff member
May 14, 2008
10,318
93
#10
Some of my TCC log files (in Unicode format) sometimes contain sequences in
ANSI code (there are reasons for this :artist:)
I don't know of any way to do it automatically, as there's no obvious difference between a 2-byte Unicode character and a 2-byte pair of ANSI characters.

Provided your file didn't have any extended Unicode characters (i.e., > 255) you could pick it apart with @FILEREADB looking for characters that didn't have a 0 for the high byte, and converting those to Unicode.
 
#11
I don't know of any way to do it automatically, as there's no obvious difference between a 2-byte Unicode character and a 2-byte pair of ANSI characters.

Provided your file didn't have any extended Unicode characters (i.e., > 255) you could pick it apart with @FILEREADB looking for characters that didn't have a 0 for the high byte, and converting those to Unicode.
I was afraid you'd say that,Rex. So I have to read the whole thing as BYTE- or WORD-Stream
and examine it, on LSB and MSB are not equal to NUL, and so on.
Maybe the the functions BALLOC,BREAD, BPEEK and so on are helpful or faster in execution ?
I will try it out :joyful:
 
Top Bottom