Unless I am missing something, there seems to be a bug (or maybe its WAD !ugh!) in @replace when the end of the file has CRLF or CRLFCRLF.Or maybe I don't understand how variables are handled internally.
The many text files I want to process have a CRLF (Carriage Return Line Feed) at the end of every line. A new paragraph has CRLFCRLF at its beginning. Some files end with CRLF; some end with normal characters, such as "." or " ".
The processing I want to do is to: remove every single CRLF and retain every CRLFCRLF.
I am using @REPLACE. My method is to 1.replace every CRLFCRLF with Hex 01 01 01 01. 2.Replace every CRLF with " ". 3.Replace every Hex 01 01 01 01 with CRLFCRLF. 4.Write out the new file.
All works fine except when the original file ends with CRLF. Then: Step 1 partially works. The CRLFCRLF are replaced properly, but the final CRLF is replaced with 00 00. (If the original file ends in CRLFCRLF, that is replaced with 00 00 00 00.) Step 2 works fine, except that the final 00 00 (or 00 00 00 00) of step 1 is now 00. Step 3 works fine and doesn't change any file 00 00. Step 4 changes the final 00 in the output file to Hex 00 00.
After testing on a few files, I tested on 20 files. In a test file, the output truncated 2 chars of the original file. The original file ended in CRLF. It seems that TCC treats variables as having a length that may vary depending on how they are processed. This implicit length is hidden, and unclear. I am frustrated!
Update: I have gotten the program to work reliably using a combination of @replace and @xreplace. I can find no logic to why one works in one particular step and the other doesn't.
Functions used: Knowing the file's length, I read the file using @saferead (from safechars plugin). Works fine.
I write the file out using @filewriteb, using for the final write a length reduced by the number of CRLF's I have replaced with spaces.
The many text files I want to process have a CRLF (Carriage Return Line Feed) at the end of every line. A new paragraph has CRLFCRLF at its beginning. Some files end with CRLF; some end with normal characters, such as "." or " ".
The processing I want to do is to: remove every single CRLF and retain every CRLFCRLF.
I am using @REPLACE. My method is to 1.replace every CRLFCRLF with Hex 01 01 01 01. 2.Replace every CRLF with " ". 3.Replace every Hex 01 01 01 01 with CRLFCRLF. 4.Write out the new file.
All works fine except when the original file ends with CRLF. Then: Step 1 partially works. The CRLFCRLF are replaced properly, but the final CRLF is replaced with 00 00. (If the original file ends in CRLFCRLF, that is replaced with 00 00 00 00.) Step 2 works fine, except that the final 00 00 (or 00 00 00 00) of step 1 is now 00. Step 3 works fine and doesn't change any file 00 00. Step 4 changes the final 00 in the output file to Hex 00 00.
After testing on a few files, I tested on 20 files. In a test file, the output truncated 2 chars of the original file. The original file ended in CRLF. It seems that TCC treats variables as having a length that may vary depending on how they are processed. This implicit length is hidden, and unclear. I am frustrated!
Update: I have gotten the program to work reliably using a combination of @replace and @xreplace. I can find no logic to why one works in one particular step and the other doesn't.
Functions used: Knowing the file's length, I read the file using @saferead (from safechars plugin). Works fine.
I write the file out using @filewriteb, using for the final write a length reduced by the number of CRLF's I have replaced with spaces.