WAD Tee printing Chinese characters

#1
When I pass output through tee all sorts of chinesse characters appear.

Code:
adb logcat | tee remote.logcat
ⴭⴭⴭⴭ‭敢楧湮湩⁧景⼠敤⽶潬⽧祳瑳浥഍䐊圯晩卩慴整慍档湩⡥†㐲⤵›牂慯捤獡剴卓䙉牯䵉ⱓ渠睥獲楳㴠㠭‵‬汯剤獳㵩ⴠ㌸഍ⴊⴭⴭⴭⴭ戠来湩楮杮漠⁦搯癥氯杯洯楡൮਍⽉敒畱獥䍴湯牴汯敬⡲㐠㈲⤴›㘰㈰ㄱ㌴攠瑮牥䤠䕏捸灥楴湯഍䤊刯煥敵瑳潃瑮潲汬牥
㈴㐲㨩〠〶ㄲ㐱″慣捴⁨佉硅散瑰潩൮਍⽉敒畱獥䍴湯牴汯敬⡲㐠㈲⤴›㘰㈰ㄱ㌴琠祲愠慧湩഍䤊刯煥敵瑳潃瑮潲汬牥
㈴㐲㨩〠〶ㄲ㐱″湅整⁲慷瑩㔮〰ര਍⽄楗楦瑓瑡䵥捡楨敮
㈠㔴㨩映瑥档獒楳湁
䱤湩卫数摥慎楴敶猠湥⁤卒䥓桃湡敧椠瑮湥ⱴ匠浡卥杩慮䱬癥汥潃湵⁴ㄽ഍
⽄楗楦瑓瑡䵥捡楨敮
㈠㔴㨩映瑥档獒楳湁䱤湩卫数摥慎楴敶猠湥⁤卒䥓桃湡敧椠瑮湥ⱴ匠浡卥杩慮䱬癥汥潃湵⁴㈽഍
⽉敒畱獥䍴湯牴汯敬⡲㐠㈲⤴›湥整⁲敳摮汁䵬楡൬਍⽄畃獲牯楗摮睯瑓瑡⡳㐠㈲⤴›汁潬慣整䌠牵潳⁲楗摮睯氠牡敧⁲桴湡搠晥畡瑬瘠污敵‬楳敺›ㄴ㐹〳ⰴ搠晥畡瑬㈺㤰ㄷ㈵഍
^C
The same without tee would look something like this:

Code:
>adb logcat
--------- beginning of /dev/log/system
D/WifiStateMachine(  245): BroadcastRSSIForIMS, newrssi =-85 , oldRssi= -84
D/WifiStateMachine(  245): fetchRssiAndLinkSpeedNative send RSSIChange intent, SameSignalLevelCount =1
--------- beginning of /dev/log/main
D/Dhcpcd  ( 3318): wlan0: renewing lease of 172.20.50.2
D/Dhcpcd  ( 3318): wlan0: sending REQUEST (xid 0x4a76c995), next in 3.91 seconds
D/Dhcpcd  ( 3318): acknowledged
D/WifiStateMachine(  245): fetchRssiAndLinkSpeedNative send RSSIChange intent, SameSignalLevelCount =2
If I use type to look at the saved file it look mostly correct. But it does contain some strange characters as well. And pieces of the path:

Code:
>type remote.logcat
ÿþ--------- beginning of /dev/log/system
D/WifiStateMachine(  245): BroadcastRSSIForIMS, newrssi =-85 , oldRssi= -83
--------- beginning of /dev/log/main
I/RequestController( 4224): 06021143 enter IOException
I/RequestController( 4224): 06021143 catch IOException
I/RequestController( 4224): 06021143 try again
I/RequestController( 4224): 06021143 Enter wait.5000
T;C:�鮙暿♦ⶀϽ  ᝰЃ  ion\1.8\bin;C:\Program Files (x86)\QuickTime\QTSystem\;C:\Program Files (x86)\Skype\Phone\;"C:\Work\Projects\XXX-AG";C:\Users\martin.krischik\Applications;C:\Users\martin.krischik\Applicat
ions\Utilities;C:\opt\Java\jdk\1.8.0\bin;C:\opt\android-sdk-window
And if is use view to look at the file the Chinese characters are back

Code:
view remote.logcat
ⴭⴭⴭⴭ‭敢楧湮湩⁧景⼠敤⽶潬⽧祳瑳浥഍䐊圯晩卩慴整慍档湩⡥†㐲⤵›牂慯捤獡剴卓䙉牯䵉ⱓ渠睥獲楳㴠㠭‵‬汯剤獳㵩ⴠ㌸഍ⴊⴭⴭⴭⴭ戠来湩楮杮漠⁦搯癥氯杯洯楡൮਍⽉敒畱獥䍴湯牴汯敬⡲㐠㈲⤴›㘰㈰ㄱ㌴攠瑮牥䤠䕏捸灥楴湯഍䤊刯煥敵瑳潃瑮潲汬牥
㈴㐲㨩〠〶ㄲ㐱″慣捴⁨佉硅散瑰潩൮਍⽉敒畱獥䍴湯牴汯敬⡲㐠㈲⤴›㘰㈰ㄱ㌴琠祲愠慧湩഍䤊刯煥敵瑳潃瑮潲汬牥
㈴㐲㨩〠〶ㄲ㐱″湅整⁲慷瑩㔮〰ര਍⽄楗楦瑓瑡䵥捡楨敮
㈠㔴㨩映瑥档獒楳湁䱤湩卫数摥慎楴敶猠湥⁤卒䥓桃湡敧椠瑮湥ⱴ匠浡卥杩慮䱬癥汥潃湵⁴ㄽ഍
⽄楗楦瑓瑡䵥捡楨敮
㈠㔴㨩映瑥档獒楳湁䱤湩卫数摥慎楴敶猠湥⁤卒䥓桃湡敧椠瑮湥ⱴ匠浡卥杩慮䱬癥汥潃湵⁴㈽഍
⽉敒畱獥䍴湯牴汯敬⡲㐠㈲⤴›湥整⁲敳摮汁䵬楡൬਍⽄畃獲牯楗摮睯瑓瑡⡳㐠㈲⤴›汁潬慣整䌠牵潳⁲楗摮睯氠牡敧⁲桴湡搠晥畡瑬瘠污敵‬楳敺›ㄴ㐹〳ⰴ搠晥畡瑬㈺㤰ㄷ㈵഍
Now the PC path should not be part of an Android log.
 

Attachments

#2
What is the attached REMOTE.TXT intended to be? It seems corrupt. It has a Unicode byte order mark but does not contain Unicode. It also containg exactly six NUL characters ... three newlines of the form 0x000D000A (whereas all the other newlines in it are 0x0D0A). It that file is intended as input, then I'm not too surprised that output is corrupt.
 
#3
As far as I can see in the downloaded file and from TYPE, remote.logcat starts with a BOM Byte-Order-Marker, char(255) and char(254), stating that the file is in UTF-16, although the remainder seems to be actually plain ASCII. I would say the file is inconsistent.
The pipe (actually it's not TEE) and VIEW process the BOM and then assume the file is in UTF-16, which then happens to contain (mainly) chinese ideograms. TYPE does not.
 
#6
What is the attached REMOTE.TXT intended to be?
I have already given a correct example: The second code, not using tee is correct. I guess I should have made it more clear that this is the expected output.

You appear to have a Unicode / ASCII mismatch here. What is your "Unicode Output" setting in OPTION?
I am surprised that this should make a difference. TEE normally does not need to know the encoding as TEE should just take the standard input stream and push the data unchanged to the standard out and the file.

With strong emphasis on unchanged. I (and with me anybody who is used to the UNIX version of TEE) would expect the data not being changed. If fact: TEE should be able to handle binary data.

As a UNIX user I expect that when I do this

Code:
COPY TCMD.EXE CON: | TEE TCMD1.EXE > TCMD2.EXE
both TCMD1.EXE and TCMD2.EXE are executable copies of TCMD.EXE. It would have the same effect as:

Code:
COPY TCMD.EXE TCMD1.EXE 
COPY TCMD.EXE TCMD2.EXE
 
Last edited:

rconn

Administrator
Staff member
May 14, 2008
10,588
97
#7
I am surprised that this should make a difference. TEE normally does not need to know the encoding as TEE should just take the standard input stream and push the data unchanged to the standard out and the file.
TEE doesn't know (or care). The issue is with STDIN & STDOUT, which in Windows can either be in ASCII (the default for CMD and thus TCC) or Unicode (if you set the startup option).

UNIX / Linux has it easy, as everything is always UTF8.