Documentation File reading questions

Steve Fabian · May 14, 2012

1) Does @FILESEEKL keep track of the current line number? I.e., if the current line is 1000, would %@FILESEEKL[n,1010] be as fast as %@FILESEEKL[n,10,1] ?

2) When DO or FOR is processing a file a line at a time using "DO line in @file" or "FOR %line in (@file)" is there an internal line counter that could be accessed, e.g., via a future internal variable?

rconn · May 14, 2012

Steve Fabian said:
1) Does @FILESEEKL keep track of the current line number??

No.

2) When DO or FOR is processing a file a line at a time using "DO line in @file" or "FOR %line in (@file)" is there an internal line counter that could be accessed, e.g., via a future internal variable?

No.

mathewsdw · May 15, 2012

Steve,

While this almost certainly isn't quite as convenient as it would be if @FileSeekL did what you want it to do (which it clearly does not), it isn't really hard and I've been doing this for some number of years, and I really don't see any significant reason(s) why you couldn't do it this way. Complete, fully tested and working, example:

Code:

@Echo Off
SetLocal
On ErrorMsg Goto Fini
On Break Goto Fini
Do X In /P Timer On (Set StartTime=%X)
Set FileName="D:\A Reasonably Large File.data"
Do X In /P Timer On (Set CountTime=%X)
Set Size=%@Inc[%@Lines[%FileName]]
UnSetArray /Q Data
SetArray Data[%Size]
Set Handle=%@FileOpen[%FileName, r]
Set IDX=0
Set Line=%@FileRead[%Handle]
Do While "%Line" != "**EOF**"
  Set Data[%IDX]=%Line
  Set /A IDX+=1
  Set Line=%@FileRead[%Handle]
EndDo
Echo >NUL: %@FileClose[%Handle]
Do X In /P Timer /S (Set ReadTime=%X)
Do IDX = 0 To %@Dec[%Size] By 1
  @Echo %IDX: %Data[%IDX]
  Set /A IDX+=1
EndDo
:Fini
On ErrorMsg
@Echo %Size
Do X In /P Timer Off (Set EndTime=%X)
@Echo      Size: %Size
@Echo Start Time: %StartTime
@Echo Count Time: %CountTime
@Echo  Read Time: %ReadTime
@Echo  End Time: %EndTime
UnSetArray /Q Data
EndLocal
Quit 8

I decided to leave all of the code in, including the "Timers" and "status" messages, etc. (although the "important" code is that between "Set Size=" and "@FileClose" lines); I don't think that they really make the code all that much harder to understand (at least for somebody like you! ;)) and allowed me to measure performance. And in terms of "real world" performance:

"Real world" times when writing the data out to a file with "Echo":

Code:

      Size: 3506
Start Time: Timer 1 on:  8:49:53
Count Time: Timer 1 on:  8:49:53
Read Time: Timer 1  Elapsed: 0:00:32.44
  End Time: Timer 1 off:  8:50:31  Elapsed: 0:00:38.25

Which is, if you figure it out, 0.0037 seconds to read each record, which ain't bad by my estimation, and, for what it's worth, 0.00166 seconds to write each record out to a file.

And, for writing it directly to the console:

Code:

      Size: 3506
Start Time: Timer 1 on:  8:49:53
Count Time: Timer 1 on:  8:49:53
Read Time: Timer 1  Elapsed: 0:00:32.44
  End Time: Timer 1 off:  8:50:31  Elapsed: 0:00:38.25

Which is 0.012 seconds per record, which also isn't bad, although it does show that there can be a lot of variation in processing times depending on what else is going on in the system, I suppose. And, for reference, it took 12.62 seconds to write the data to the console, or .0036 seconds per record; and the difference between the time to write it out to the console rather than the file really can't be determined in any reliable way given the time differences reading the file. (I will note that when I ran the second test writing the data out to the console the data was presumably already in the disk cache which makes the fact that it took so much longer the second time around rather surprising, I think. Bottom line, elapsed times are not all that relevant except over a fairly large number of "samples". Also, my machine is hardly what one would call fast, particularly the disk drives.)

And this code is simple enough, even with my semi-blindness and bad memory, that I have no "issues" with it whatsoever.

I will also note that reading the file one time ("%@Lines") was not at all a significant performance issue, so trying to guess what the maximum size of the file would be is to set the size of the array is probably not a worthwhile thing to do.

- Dan

Steve Fabian · May 15, 2012

Since the amount of memory required by an array depends only on the elements actually populated, but not its allocated size, instead of your procedure, I have often allocated an excessively large array, populated it with the @FILEARRAY function, and used the function's value (the number of lines read into the array) for iteration control. Here are some other techniques you might find useful:
- for entities which use offset referencing (i.e., starting with 0) use a variable which is the element count, and another for the biggest offset (they differ by one 1, but I hate to recalculate multiple times)
- when reading lines of a file of known line count, use DO %LINECOUNT - not a special test for EOF
- using @fileread another method to avoid reading past EOF:
do while %@fileseek[%handle,0,1] LT %filesize
note that I saved the file size in a variable so I don't need to access the file system each time, and that this form of @FILESEEK just returns a number without affecting the file buffers in any way
- your loop reading the data from the file into the array (which could have been done with the @FILEARRAY function without a loop) could have used DO IDX = 0 to LASTOFFSET and avoided an extra statement to increase IDX
- in the loop to display the array elements the SET /A command modifies IDX, but the ENDDO loop control immediately following overrides it, making it superfluous.

mathewsdw · May 16, 2012

Steve,

Good points all; but number one, I just showed something I've been doing for years pretty much without modification (or even really thinking about it much), and number two, there's far too much in TCC for me (or practically anyone else?) to know it all (although you seem to come close!). However, in the future I'll try to remember to try some of your suggestions (although it's not a high priority since what I've been doing does work just fine and is also quite fast - as I think I indicated, %@Lines takes essentially no time at all for the files I have to deal with).

And where that came from initially is pretty simple: Just the common technique of reading a file until EOF in pretty much any other circumstance.

As far as your comment about the increment being superfluous in the loop, you are, of course, correct there. However, that was just a stupid mistake (as I am so fond of making!) on my part. You see, my habit is to use an explicit "Do While" in such situations (I'm not defending, it's just a habit; maybe it comes from something to do with it works pretty much the same in all of the high-level languages I've used over the years). And, after I made the initial entry of my sample code in the web page, I thought "You know, most people would probably automatically do a "Do" taking that format (so I was totally aware of the "Do" in the form I suggested), and I changed that at the last minute without really checking it (a stupid mistake, and I mean not checking it, not the coding error itself).

- Dan

Search

Welcome!

Documentation File reading questions

Steve Fabian

rconn

Administrator

mathewsdw

Steve Fabian

mathewsdw

Similar threads