TYPE goes crazy with no-BOM Unicode file

May 20, 2008
11,400
99
Syracuse, NY, USA
I asked W32Time to produce a log file. It produced a Unicode file with no BOM. Here are the first few lines (copied/pasted from Notepad).
Code:
152593 04:23:58.7934268s - ---------- Log File Opened -----------------
152593 04:23:58.7936246s - RPC Call - Query Configuration
152593 04:23:58.7936990s - RPC Call - Query Provider Configuration
152593 04:23:58.8100194s - TimeProvCommand([NtpClient], TPC_Query) called.
152593 04:23:58.8101640s - RPC Call - Query Provider Configuration
152593 04:24:07.5284957s - RPC Caller is BB\vefatica (S-1-5-21-3764633515-3696517045-806287659-1001)

Here they are again according to CMD's TYPE.
Code:
v:\> cmd /c type w32tm.log
1 5 2 5 9 3   0 4 : 2 3 : 5 8 . 7 9 3 4 2 6 8 s   -   - - - - - - - - - -   L o g   F i l e   O p e n e d   - - -
  - - - - - - - - - - - - -
 1 5 2 5 9 3   0 4 : 2 3 : 5 8 . 7 9 3 6 2 4 6 s   -   R P C   C a l l   -   Q u e r y   C o n f i g u r a t i o
 1 5 2 5 9 3   0 4 : 2 3 : 5 8 . 7 9 3 6 9 9 0 s   -   R P C   C a l l   -   Q u e r y   P r o v i d e r   C o n
  g u r a t i o n
 1 5 2 5 9 3   0 4 : 2 3 : 5 8 . 8 1 0 0 1 9 4 s   -   T i m e P r o v C o m m a n d ( [ N t p C l i e n t ] ,
  C _ Q u e r y )   c a l l e d .
 1 5 2 5 9 3   0 4 : 2 3 : 5 8 . 8 1 0 1 6 4 0 s   -   R P C   C a l l   -   Q u e r y   P r o v i d e r   C o n
  g u r a t i o n
 1 5 2 5 9 3   0 4 : 2 4 : 0 7 . 5 2 8 4 9 5 7 s   -   R P C   C a l l e r   i s   B B \ v e f a t i c a   ( S -
  5 - 2 1 - 3 7 6 4 6 3 3 5 1 5 - 3 6 9 6 5 1 7 0 4 5 - 8 0 6 2 8 7 6 5 9 - 1 0 0 1 )

Here's some of what TCC's TYPE gives ... garbage, and apparently not related to what's in the file.
Code:
v:\> type w32tm.log
1 :  ]  ၪᰀ耀ᖨ翵翼     㚉䐡V:\  $p ၯᴀ耀ᖨ翵翼     ep\ \  S  ၬḀ退耀귅翼 ✐ȱ            ၡἀ退耀귅翼 Ⳡȱ     
ၦ 退㪭堧ƻ䅤욕孵떠ⴖ            ၻ℀退耀귅翼 ⵀȱ            ၸ∀退捉酥䮩䅃꺡뱿똠백            ၽ⌀退Őȱ  ȱ  ⦐ȱ  ȱ
ၲ␀鐀ᖨ翵翼 ⶠȱ    耀        ၷ─退耀귅翼 ⪠ȱ            ၴ☀耀ᖨ翵翼     liV:\  p  ၉✀耀\??\v:\w32tm.log  s ၎⠀退耀
귅翼 ⫐ȱ                    뷟肇  㤀ȱ  룰ȱ      [4324]  v:\
ᨀ脄 ȱ  ᤀ脅 ꫐ȱ  ȱ
WS _LINES_MAXLEN=94 _LINES_MAXLOC=3         병톶脚ࠀ꭛翼  ꭙ翼    䁨ȱ  ꭛翼      ꭛翼  ꭛翼  ꭛翼  ꭛翼  ꭛翼  ꭛翼  ꭛翼
 
May 20, 2008
11,400
99
Syracuse, NY, USA
LIST handles the file better but there's an interesting distinction between the x64 and x86 versions of TCC. The x64 version shows
1539581916033.png

while the x86 version shows
1539582127712.png


And if I go back to v16, TYPE works better, showing the file with spaces between the characters, like CMD..
 
May 20, 2008
11,400
99
Syracuse, NY, USA
It was a really weird file that the Windows IsTextUnicode API couldn't decipher. I added a hack.
What was so weird about it? To me, it looked like what the docs refer to as:

Code:
IS_TEXT_UNICODE_ASCII16 The text is Unicode, and contains only zero-extended ASCII values/characters.
 
Aug 23, 2010
637
9
There's a rather easy way of "hacking" text files, thanks to characteristics of UNICODE encodings, that requires little read ahead to detect correct encoding with high certainty.

1. Treat input as UTF-8. (ASCII compatible.)
2. If seeing byte sequence, that does not decode as UTF-8, see if it alternates same byte every 4'th place. See if you can decode it as UTF-32 (LE/BE).
3. If that fails, try to decode it as UTF-16.
4. If all else fails, assume ASCII/extended.

You can buffer lines as long, as encoding is uncertain.
Once encoding is certain, you can stop buffering and just send input straight to decoder.
 
Similar threads
Thread starter Title Forum Replies Date
Joe Caverly Using TYPE with non-English text Support 22
vefatica TYPE behaving randomly Support 10
vefatica Garbage from TYPE Support 2
vefatica TYPE resets console tab settings Support 14
Joe Caverly Get a variable type Support 2
Joe Caverly v24 TYPE Re-direction Support 4
J Piping ANSI control sequences through 'type' Support 4
nickles How to? Follow a JUNCTION type directory link Support 9
S International characters - TYPE vs %@LINE function Support 5
thedave WAD Unable to type various characters using Windows 8's onscreen keyboard Support 7
vefatica Fixed TYPE http://... incomplete data Support 40
Phileosophos Documentation TPIPE /string type code confusion Support 4
S WAD %_do_errors does not report errors when loop control variable type is not directory entry Support 1
vefatica TYPE beeping? Support 12
Charles Dye TYPE /P behaves strangely with piped/redirected input Support 0
vefatica LIST and TYPE show UTF8 BOM Support 4
vefatica TYPE, Unicode, installer Support 10
Charles Dye Fixed TYPE /X reports FF for all values >= 80h Support 6
M More SafeChars type issues... Support 4
Charles Dye Strange output, here-doc redirection, TYPE, //UnicodeOutput=Yes Support 6
Roedy MIME type for *.btm ? Support 1
S TYPE command issue Support 12
vefatica TYPE /X and Unicode files? Support 0
dcantor LIST /X and TYPE /X give different results Support 20
Juanma Barranquero App Paths of type REG_EXPAND_SZ Support 2
E colorization outside of type Support 7
p.f.moore Documentation tweak for TYPE Support 5
Joe Caverly CMDebug v23 Help: Link to MOUNTVHD goes to MOUNTISO in UNMOUNTVHD Support 0
D New 64-bit install goes to Program Files x86 Support 3
vefatica FFIND goes crazy Support 8
noahcoad Possible Bug? Launching 4nt.exe Goes Hairwire Support 10
vefatica v26 exception log gone crazy Support 6
M Am I just crazy, or ... Support 4

Similar threads