UNICODE mixed with ANSI Code

Peter Murschall · Dec 6, 2017

Some of my TCC log files (in Unicode format) sometimes contain sequences in
ANSI code (there are reasons for this :artist:

)
Question: Is there a possibility to transfer these files completely into Unicode with TCC?
I know, TPIPE is a monster tool, but at the moment I have the impression that it is not
suitable for this. So mayby a BTM can do the job better.

Charles Dye · Dec 6, 2017

You have 8-bit characters embedded in otherwise UTF-16 text files?

Peter Murschall · Dec 6, 2017

Charles Dye said:
You have 8-bit characters embedded in otherwise UTF-16 text files?

Yes, exactly. Here is (a fragment of) such a problem file.
Btw. FILEREAD may have a problem with this thing that I can't explain ....

vefatica · Dec 6, 2017

That file is a mess. Some of the 8-bit stuff is syntax messages (which you wouldn't expect to get into a log file).

And the times don't always go in order. For example, there are entries from [24.10.17 15:57:09] which are before entries from [25.10.17 09:21:41].

There are also many "Param/Comment" entries referring to MetaPad sessions. What are they?

Do you have another program also logging to that file?

vefatica · Dec 6, 2017

I don't know what's expected here, but I can mess up a Unicode text file easily with >>.

Code:

v:\> notepad uctest.txt

Code:

v:\> type uctest.txt
My dog has fleas.
v:\> echo foo >> uctest.txt

Now (this is the best part).

Peter Murschall · Dec 6, 2017

vefatica said:
There are also many "Param/Comment" entries referring to MetaPad sessions. What are they?

Do you have another program also logging to that file?

It's as part of my real JPSOFT-History log. In the dark age of Beginnig of 4DOS/4OS/4NT I've worked a long time
with the Log-Feature of it, so the weird "Param/Comment"-Entries result from this time :blackalien:

Only TCC is the logger, but sometimes I must switch to ANSI, and I do it with OPTION //UNICODEOUTPUT=NO.

One reason is for example DISKPART. I've written a BTM to control it in the way
"DISKPART /s Commandscript.txt"
but DISKPART cannot handle UNICODE in "Commandscript.txt", so it must be ANSI.
And as I look in the last month into Log, I see the desaster mix of Codes.
It's not a big problem, but I'm looking for a way to clean the file, so I've had the hope, that I can do it with TCC.

Charles Dye · Dec 6, 2017

vefatica said:
I don't know what's expected here, but I can mess up a Unicode text file easily with >>.

Depends on the value of //UnicodeOutput.

vefatica · Dec 6, 2017

Charles Dye said:
Depends on the value of //UnicodeOutput.

Changing that value screws up a command history file.

rconn · Dec 7, 2017

vefatica said:
Changing that value screws up a command history file.

You either have Unicode output or ANSI output. You can't randomly change it for existing files and expect it to be readable / usable.

rconn · Dec 7, 2017

Peter Murschall said:
Some of my TCC log files (in Unicode format) sometimes contain sequences in
ANSI code (there are reasons for this )

I don't know of any way to do it automatically, as there's no obvious difference between a 2-byte Unicode character and a 2-byte pair of ANSI characters.

Provided your file didn't have any extended Unicode characters (i.e., > 255) you could pick it apart with @FILEREADB looking for characters that didn't have a 0 for the high byte, and converting those to Unicode.

Peter Murschall · Dec 7, 2017

rconn said:
I don't know of any way to do it automatically, as there's no obvious difference between a 2-byte Unicode character and a 2-byte pair of ANSI characters.

Provided your file didn't have any extended Unicode characters (i.e., > 255) you could pick it apart with @FILEREADB looking for characters that didn't have a 0 for the high byte, and converting those to Unicode.

I was afraid you'd say that,Rex. So I have to read the whole thing as BYTE- or WORD-Stream
and examine it, on LSB and MSB are not equal to NUL, and so on.
Maybe the the functions BALLOC,BREAD, BPEEK and so on are helpful or faster in execution ?
I will try it out :joyful:

Charles Dye · Dec 7, 2017

Peter Murschall said:
I was afraid you'd say that,Rex. So I have to read the whole thing as BYTE- or WORD-Stream and examine it, on LSB and MSB are not equal to NUL, and so on.

It would have to be byte-by-byte, not even a word at a time. Thanks to the 8-bit characters in the stream, the Unicode characters may not be word-aligned....

Search

Welcome!

UNICODE mixed with ANSI Code

Peter Murschall

Charles Dye

Super Moderator

Peter Murschall

Attachments

vefatica

vefatica

Peter Murschall

Charles Dye

Super Moderator

vefatica

rconn

Administrator

rconn

Administrator

Peter Murschall

Charles Dye

Super Moderator

Similar threads