Fileread fails on Unicode file

I'm trying to read in log files from Exact Audio Copy, but both @line and @fileread fail.

The files are in Unicode (2 bytes per char), but I think the problem is that every file starts with a single FF byte. I can see how this would desynchronise a Unicode read.

When I use @line, only the first line can be read (correctly) -- higher line request return nothing. When I use @fileread, only the first two bytes of the file are read. Subsequent reads return EOF.

I don't know if FOR or DO would work (likely not), but either would require a major restructure of my batch file, so I'd prefer to avoid them.

Is my only other option to read byte by byte with @filereadb? Or could I perhaps discard the initial byte somehow then continue from there with @fileread? Or is the problem elsewhere?

Thanks for any help. (TCC v13 x64)
 
Okay, that's interesting. I actually misinterpreted the display slightly, then.

When I list one of the log files, the first six bytes are always FF FE 45 00 78 00 (which is 255 254 69 00 120 00 in decimal). But %@ascii[%@fileread[]] returns 160 9632 69 and nothing else (that's the whole line, and further filereads return EOF). Interpreted as three numbers, that would be A0 25A0 45 in hex. So that doesn't make a lot of sense to me. Note that the fileopen uses ,r,t as options, but ,r alone has the same result.

I just tried using FOR at the command line, and that does seem to work correctly (though echoing strings with pipes in them makes for a lot of errors).

I'll attach a sample file so you can try it yourself, if you want; zipped so it doesn't get modified in transit.

Failing any further options, looks like restructuring my code to use FOR is the best way forward. But if you can suggest something else, that would be awesome.

Thanks, mate.
 

Attachments

  • EAC-sample-log.zip
    2.3 KB · Views: 96
May 20, 2008
11,536
103
Syracuse, NY, USA
I don't know what the ultimate goal is, but it might be easier to achieve if you read the file into an array. Examples (I renamed your file):
Code:
v:\> setarray /f /r Unicode.log a

v:\> echo %a[0]
Exact Audio Copy V1.1 from 23. June 2015

v:\> echo %a[2]
EAC extraction logfile from 17. June 2016, 0:05

v:\> echo %a[%@dec[%@arrayinfo[a,1]]]
==== Log checksum 513B43020E430C81E1B7DC9C2E9C83F3A69D4DE7E7BA2D93EA3F23D4AECE56CC ====
 
Interesting idea. Thanks for the suggestion. Unfortunately, my version of TCC doesn't have the SETARRAY command.

As for what I'm doing, I'm trying to build a CSV table of the results of each file extraction. That's why a linear once-over the file is sufficient (though I suppose memory concerns are a bit passe). However, in my code I tried to do the track processing in a subroutine. Restructuring the code for FOR will involve setting and resetting status flags -- bit messy.

I'm actually really curious why @FILEREAD is failing. I've had other programs also struggle, on occasion, with these files.
 
May 20, 2008
11,536
103
Syracuse, NY, USA
Do you have @FILEREADB? Here are some things to consider.
Code:
v:\> set h=%@fileopen[Unicode.log,r,b]

v:\> set r=%@filereadb[%h,10]

v:\> echo %r
255 254 69 0 120 0 97 0 99 0

v:\> echo %@fileseek[%h,2,0] (skip the BOM)
2 (skip the BOM)

v:\> set r=%@filereadb[%h,10]

v:\> echo %r
69 0 120 0 97 0 99 0 116 0

v:\> do i=0 to 9 ( echos %@if[%@word[%i,%r] NE 0,%@char[%@word[%i,%r]],] )
Exact
 
May 29, 2008
571
4
Groton, CT
Or maybe even
Code:
do i=0 to 8 by 2 ( echos %@char[%@EVAL[256*%@word[%@inc[%i],%r]+%@word[%i,%r]]] )
Exact
 
May 20, 2008
11,536
103
Syracuse, NY, USA
Or maybe even
Code:
do i=0 to 8 by 2 ( echos %@char[%@EVAL[256*%@word[%@inc[%i],%r]+%@word[%i,%r]]] )
Exact
Nice! If he doesn't have SETARRAY (introduced in v10), then he doesn't have a command line DO ... probably no problem.

Dave, Looking at your post and the quoted version in the message composer, I realize that I never realized that the CODE tags work even in lowercase! I always type them myself ... it's faster.

I wish Rex would chime in. I would have expected @FILEREAD to handle Unicode.
 
Actually, I do have command line DO. I paid for v3, v4, v5, but since then I've found TCC LE to be sufficient for (most of) my needs. Hence, I'm using TCC LE v13 x64. I do plan to buy a more recent version of TCC at some stage in the future, but I've never found much use for all the graphical features in TCE.

OK, so what's being suggested is to read byte-by-byte, strip the UTF header using @fileseek, then convert the bytes to characters by either skipping zero bytes or doing the maths to combine them into a 16-bit value. Both great suggestions. The latter obviously has the advantage of picking up European accented characters, which do appear in some CD track and artist names (Lady Gaga has some tracks with umlauts, for example). The basic problem with both of these approaches is that they don't easily pick up line-endings, and the data I have is heavily line-oriented.

Using FOR %line in (@file.log) ... does work, for some reason, where @fileread doesn't. So that has to be the simplest fall-back.

The trouble with using FOR is that I've structured the code to read in each line just in time for when I need it. For example, I have a subroutine that reads in the data for each track and keeps processing lines until it reaches the end of the track, at which point it returns to another line-reading section that looks for the start of a new track. If I use FOR, all the lines are being read in at the same point. So I have to track where I am using status flags (such as a variable that's set to 1 when I'm inside a track and 0 when I'm not). From a programming point of view, that's pretty clunky.

Still, unless someone can see something inherently dodgy about my Unicode files and work-around for it, looks like I'd better move things around and use FOR.

Thanks for the ideas.
 
Well that's a bit embarrassing!

Yes, it seems @LINE[] does work. The results I was getting were because I was reading lines 1, 3, 5, which are all blank in the file, hence ECHO %@LINE["file.log",1] was returning ECHO is OFF.

So I can use @LINE with a global line counter -- that certainly makes life easier!

It's still odd that @FILEREAD doesn't work, but I'll leave that in your capable hands. I still suspect there's something odd with the way these log files are written (other programs have occasional glitches reading them -- though it could also be code page issues). If there is a bug in @FILEREAD (or was, in the version of TCC I'm using), it's clearly not very impactful to have slipped by for so long (assuming it hasn't already been fixed in a later version). And if it is still around, I hope this has been useful. :)

I see there's a new version of TCC LE. Thank you. I'll have to upgrade.

And thanks to everyone who's responded.
 
Similar threads
Thread starter Title Forum Replies Date
M @FileRead from a device issues... Support 6
S CON: not processed correctly in @FILEREAD and @FILEARRAY Support 1
old coot Debugging .btm with >3 arguments fails on step. Support 3
bwawsc2 Check for updates fails to complete Support 9
samintz COPY fails to copy from \\wsl$ Support 20
M Goto fails when a text endtext block precedes the code Support 5
vefatica SETP usually fails with a 32 bit process Support 4
S INPUT fails if the entered text contains pair of square brackets Support 6
B Request Manual Key fails Support 3
L TCC V22 installer fails signature validation Support 0
A Fixed v25 regression: Parameter expansion fails in piped commands Support 2
vefatica START /PGM "name with spaces.URL" fails Support 2
I zip /M fails to delete file after adding to zip file Support 2
D skip= in FOR /F fails Support 9
Joe Caverly Using a Directory Alias with @iniwrite fails Support 14
A WAD Output redirection to IF block fails with "unbalanced brackets" Support 7
E My echo full file name to the clipboard fails now. Support 2
A Fixed (CMD compat) START /D fails to recognize the switch option. Support 3
B MKLINK requires admin - fails silently. Support 1
D Fixed Take Command 21.0.29 fails – MSVCP140.dll missed Support 6
F Powershell call fails with "Scripting Integrator 2016 (PowerShell)" license error Support 9
fishman@panix.com Version 21.23 fails to install Support 4
D unzip fails to find valid license Support 3
D Upgrade from 19 to 19.10 fails Support 2
C Upgrade to v19.10.42 fails Support 5
D V19 fails to install on Win7-64bit, installer says: Support 2
bervin Webform fails on HTTPS Support 1
thedave Windows 10: Pinned taskbar fails Support 20
P MS VS2013 vsdevcmd.bat fails to run with tcmd 17 x64 Support 10
vefatica @REGQUERY with hostname fails Support 12
tmaynard Bug Typesafe Activator fails under Take Command shell Support 2
R Fixed Internal ZIP command fails to process multiple files Support 3
P Copying descript.ion file fails Support 8
S WAD FTP copy fails in 16.00.25 Support 24
S WAD KEYSTACK fails in TCC-64 Support 2
vefatica Files disappear in Explorer when DEL fails Support 6
T tcc /c "[...]" fails when run from third party terminal Support 11
C Latest TCMDx64 fails on XP Pro x64 Support 1
dcantor WAD dir "ftp:// ..." fails in TCC 15 Support 7
M WAD Copy command fails rather weirdly... Support 2
S WAD COPY from FTP site fails in 14.03.51 Support 9
M Updating environment variable from C++ fails erratically in version 14... Support 57
dcantor Take Command 14 release fails Support 6
S WAD "Mark Forums Read" fails Support 4
dcantor Fixed Take Command 14.0.20 fails Support 19
D Upgrade from b12 fails Support 5
scottb Toolbar filter by directory attribute /A:D fails Support 1
Stefano Piccardi IDE.exe start toobar button fails when % in script path Support 0
H Install fails and removes tcmd Support 4
vefatica 12.10 registration fails Support 3

Similar threads