UNICODE mixed with ANSI Code

Some of my TCC log files (in Unicode format) sometimes contain sequences in
ANSI code (there are reasons for this :artist:)
Question: Is there a possibility to transfer these files completely into Unicode with TCC?
I know, TPIPE is a monster tool, but at the moment I have the impression that it is not
suitable for this. So mayby a BTM can do the job better.
 
You have 8-bit characters embedded in otherwise UTF-16 text files?
Yes, exactly. Here is (a fragment of) such a problem file.
Btw. FILEREAD may have a problem with this thing that I can't explain ....
 

Attachments

  • UNI_ANSI-Code-Kurz.txt
    623 KB · Views: 78
May 20, 2008
11,536
103
Syracuse, NY, USA
That file is a mess. Some of the 8-bit stuff is syntax messages (which you wouldn't expect to get into a log file).
upload_2017-12-6_14-11-35.png


And the times don't always go in order. For example, there are entries from [24.10.17 15:57:09] which are before entries from [25.10.17 09:21:41].

There are also many "Param/Comment" entries referring to MetaPad sessions. What are they?

Do you have another program also logging to that file?
 
May 20, 2008
11,536
103
Syracuse, NY, USA
I don't know what's expected here, but I can mess up a Unicode text file easily with >>.
Code:
v:\> notepad uctest.txt
upload_2017-12-6_14-41-19.png

Code:
v:\> type uctest.txt
My dog has fleas.
v:\> echo foo >> uctest.txt

upload_2017-12-6_14-42-43.png


Now (this is the best part).
upload_2017-12-6_14-43-48.png
 
There are also many "Param/Comment" entries referring to MetaPad sessions. What are they?

Do you have another program also logging to that file?
It's as part of my real JPSOFT-History log. In the dark age of Beginnig of 4DOS/4OS/4NT I've worked a long time
with the Log-Feature of it, so the weird "Param/Comment"-Entries result from this time :blackalien:

Only TCC is the logger, but sometimes I must switch to ANSI, and I do it with OPTION //UNICODEOUTPUT=NO.

One reason is for example DISKPART. I've written a BTM to control it in the way
"DISKPART /s Commandscript.txt"
but DISKPART cannot handle UNICODE in "Commandscript.txt", so it must be ANSI.
And as I look in the last month into Log, I see the desaster mix of Codes.
It's not a big problem, but I'm looking for a way to clean the file, so I've had the hope, that I can do it with TCC.
 

rconn

Administrator
Staff member
May 14, 2008
12,404
152
Some of my TCC log files (in Unicode format) sometimes contain sequences in
ANSI code (there are reasons for this :artist:)

I don't know of any way to do it automatically, as there's no obvious difference between a 2-byte Unicode character and a 2-byte pair of ANSI characters.

Provided your file didn't have any extended Unicode characters (i.e., > 255) you could pick it apart with @FILEREADB looking for characters that didn't have a 0 for the high byte, and converting those to Unicode.
 
I don't know of any way to do it automatically, as there's no obvious difference between a 2-byte Unicode character and a 2-byte pair of ANSI characters.

Provided your file didn't have any extended Unicode characters (i.e., > 255) you could pick it apart with @FILEREADB looking for characters that didn't have a 0 for the high byte, and converting those to Unicode.

I was afraid you'd say that,Rex. So I have to read the whole thing as BYTE- or WORD-Stream
and examine it, on LSB and MSB are not equal to NUL, and so on.
Maybe the the functions BALLOC,BREAD, BPEEK and so on are helpful or faster in execution ?
I will try it out :joyful:
 
Similar threads
Thread starter Title Forum Replies Date
Peter Murschall TEE cannot handle Unicode output Support 2
B Fullwidth Unicode forms display incorrectly Support 5
T @execstr unicode support Support 6
Peter Murschall TPIPE generate unicode on Piping or redirecting Support 3
D Pasting Unicode data has different behavior on TCC and CMD Support 2
vefatica TYPE goes crazy with no-BOM Unicode file Support 7
Charles Dye TCC smashing Unicode quotes Support 9
Joe Caverly Unicode, Codepage 437, and line characters Support 3
B How to? Convert Unicode to ANSI Support 1
StarliteLemming Fileread fails on Unicode file Support 10
vefatica DO ... /P ... and Unicode? Support 3
vefatica Unicode ... I don't understand Support 1
jadaml Echo unicode characters from UTF-8 Batch files? Support 1
Charles Dye @ASCII vs. @UNICODE Support 5
A How to? Filter history list with unicode chars Support 0
vefatica TYPE, Unicode, installer Support 10
A WAD Limitations on display of unicode punctuation marks Support 11
A Include lists and Unicode Support 1
M How to? How do I read a Unicode file through standard-input? Support 4
M WAD A bit of strangeness related to Unicode-marked file not being Unicode Support 2
M @CHAR and UNICODE Support 4
D LIST command wierdness with empty Unicode file Support 1
B Unicode/dword issue in TCC12 Support 4
J dir failure with some unicode characters Support 6
M TCC Unicode support? Support 7
vefatica BOMs in [dir]history / TAIL with Unicode Support 2
vefatica Unicode screw-up in IDE Support 4
vefatica Unicode anomaly Support 0
vefatica Debugger now Unicode? Support 1
vefatica TYPE /X and Unicode files? Support 0
dcantor Convert ASCII to Unicode or vice versa? Support 6
H HISTORY and DIRHISTORY /R can't handle Unicode Support 0
R Reading an Unicode file with more than 8191 lines Support 1
M How to? Set the ribbon titles to mixed case Support 1
vefatica ANSI, PROMPT ... ??? Support 1
vefatica ANSI??? Support 9
samintz ANSI issues Support 3
FreezerBurnt ANSI Prompt messes up list output. Support 39
samintz ANSI Colors Support 33
nickles ANSI Colors Windows 7 no longer working Support 6
samintz WAD ANSI issue Support 3
AndrewJ TakeCommand v23 + ANSI color sequences leads to black on black text Support 6
K ANSI Not Working Support 8
vefatica ANSI, ANSIWin10 ... ? Support 2
vefatica Win10's ANSI doesn't like 2>NUL Support 10
vefatica How do I see Windows 10's ANSI in action? Support 3
M Please explain TCMD's ANSI x3.64 support Support 4
M ANSI is used to show files instead of OEM in version 17 to 19 ! Support 5
vefatica ANSI Support 11
vefatica WAD ANSI still misbehaves with build 35 Support 36

Similar threads