@replace and @xreplace problems

Jan 16, 2009
45
0
#1
Unless I am missing something, there seems to be a bug (or maybe its WAD !ugh!) in @replace when the end of the file has CRLF or CRLFCRLF.Or maybe I don't understand how variables are handled internally.

The many text files I want to process have a CRLF (Carriage Return Line Feed) at the end of every line. A new paragraph has CRLFCRLF at its beginning. Some files end with CRLF; some end with normal characters, such as "." or " ".
The processing I want to do is to: remove every single CRLF and retain every CRLFCRLF.
I am using @REPLACE. My method is to 1.replace every CRLFCRLF with Hex 01 01 01 01. 2.Replace every CRLF with " ". 3.Replace every Hex 01 01 01 01 with CRLFCRLF. 4.Write out the new file.
All works fine except when the original file ends with CRLF. Then: Step 1 partially works. The CRLFCRLF are replaced properly, but the final CRLF is replaced with 00 00. (If the original file ends in CRLFCRLF, that is replaced with 00 00 00 00.) Step 2 works fine, except that the final 00 00 (or 00 00 00 00) of step 1 is now 00. Step 3 works fine and doesn't change any file 00 00. Step 4 changes the final 00 in the output file to Hex 00 00.
After testing on a few files, I tested on 20 files. In a test file, the output truncated 2 chars of the original file. The original file ended in CRLF. It seems that TCC treats variables as having a length that may vary depending on how they are processed. This implicit length is hidden, and unclear. I am frustrated!
Update: I have gotten the program to work reliably using a combination of @replace and @xreplace. I can find no logic to why one works in one particular step and the other doesn't.
Functions used: Knowing the file's length, I read the file using @saferead (from safechars plugin). Works fine.
I write the file out using @filewriteb, using for the final write a length reduced by the number of CRLF's I have replaced with spaces.
 

samintz

Scott Mintz
May 20, 2008
1,312
11
Solon, OH, USA
#2
Since you are only doing CRLF processing, you could just read every line and write it back out again. And replace blank lines (double EOL) with the EOL sequence.
Code:
setlocal
setdos /x-45678
set in=%@fileopen["%1",r,t]
set out=%@fileopen["%2",w,b]
set r=%@truncate[%out]
set r=%@fileread[%in]
do while %r. != **EOF**.
  set len=%@len[%r]
  iff %len == 0 then
    set s=%@filewriteb[%out,-1,13 10]
  else
    set s=%@filewriteb[%out,%len,%r]
  endiff
  set r=%@fileread[%in]
enddo
set in=%@fileclose[%in]
set out=%@fileclose[%out]
endlocal
I'm sure there is a TPIPE filter you could use too.

I created a test.txt file:
Code:
line 1
line 2


line 5
line 6

in binary:
00000000  6C 69 6E 65 20 31 0D 0A 6C 69 6E 65 20 32 0D 0A    line 1..line 2..
00000010  0D 0A 0D 0A 6C 69 6E 65 20 35 0D 0A 6C 69 6E 65    ....line 5..line
00000020  20 36 0D 0A                                         6..
And after running the script:
Code:
test test.txt test1.txt

The binary of text1.txt is:
00000000  6C 69 6E 65 20 31 6C 69 6E 65 20 32 0D 0A 0D 0A    line 1line 2....
00000010  6C 69 6E 65 20 35 6C 69 6E 65 20 36                line 5line 6
 
Last edited:

samintz

Scott Mintz
May 20, 2008
1,312
11
Solon, OH, USA
#3
One other thing I wanted to point out is that the sequence of EOL chars by your description wasn't exactly clear.
For example:
Code:
line 1<EOL>
line 2<EOL>
<EOL>
<EOL>
line 3<EOL>
In that example every line ends with <EOL> and there are 2 blank lines. But that sequence is actually 3 <EOL>'s in a row.
Code:
line 1<EOL>
line 2<EOL>
<EOL>
line 3<EOL>
In this example, there is a sequence of 2 <EOL>'s but one of them is the EOL marker for line 2.

The test script I wrote works based on the assumption that every line ends with EOL and blank lines keep their EOL. Either way, I think it does what you wanted.
 

samintz

Scott Mintz
May 20, 2008
1,312
11
Solon, OH, USA
#4
Actually, I take that back. If your input data is structured with the 2 EOL's, the test script will reduce that to 1 EOL. If you always want 2, then change the line
Code:
set s=%@filewriteb[%out,-1,13 10]
to
set s=%@filewriteb[%out,-1,13 10 13 10]
 
#5
How about something more suited to the task?
Code:
v:\> type dups.txt
[email protected]
[email protected]

[email protected]
[email protected]

[email protected]
[email protected]

[email protected]
[email protected]

[email protected]
[email protected]

v:\> tpipe /input=dups.txt /output=dups2.txt /replace=4,0,0,0,0,0,0,0,0,"\r\n\r\n","\001" /replace=4,0,0,0,0,0,0,0,0,"\r\n","" /replace=4,0,0,0,0,0,0,0,0,"\001","\r\n\r\n"

v:\> type dups2.txt
[email protected]@foo.com

[email protected]@bar.com

[email protected]@bar.com

[email protected]@xyz.com

[email protected]@foo.com