Fixed Copying with regex (several issues)

Jan 19, 2011
581
10
Norman, OK
#1
I have gone in circles trying to figure this out, but here's what I've got...

Create ten files with eight characters each and a CRLF.

HTML:
[Z:\temp] dir
 
 Volume in drive Z is unlabeled      Serial number is 58b8:a853
 Directory of  Z:\temp\*
 
 6/27/2013  13:08         <DIR>    .
 6/27/2013  13:08         <DIR>    ..
                 0 bytes in 0 files and 2 dirs
   108,091,977,728 bytes free
 
[Z:\temp] echo %@repeat[0,8] > test.txt
 
[Z:\temp] do i = 1 to 9 (echo %@repeat[%i,8] > %i.test.txt)
 
[Z:\temp] dir /kmh
 6/27/2013  13:08              10  1.test.txt
 6/27/2013  13:08              10  2.test.txt
 6/27/2013  13:08              10  3.test.txt
 6/27/2013  13:08              10  4.test.txt
 6/27/2013  13:08              10  5.test.txt
 6/27/2013  13:08              10  6.test.txt
 6/27/2013  13:08              10  7.test.txt
 6/27/2013  13:08              10  8.test.txt
 6/27/2013  13:08              10  9.test.txt
 6/27/2013  13:08              10  test.txt
 
[Z:\temp] type *.txt
11111111
22222222
33333333
44444444
55555555
66666666
77777777
88888888
99999999
00000000
Copy the files using a regex.

HTML:
[Z:\temp] *copy ::(test)\.txt ::oops.\1.txt
Z:\temp\1.test.txt => Z:\temp\1.oops.test.txt
Z:\temp\2.test.txt =>> Z:\temp\2.oops.test.txt
Z:\temp\3.test.txt =>> Z:\temp\3.oops.test.txt
Z:\temp\4.test.txt =>> Z:\temp\4.oops.test.txt
Z:\temp\5.test.txt =>> Z:\temp\5.oops.test.txt
Z:\temp\6.test.txt =>> Z:\temp\6.oops.test.txt
Z:\temp\7.test.txt =>> Z:\temp\7.oops.test.txt
Z:\temp\8.test.txt =>> Z:\temp\8.oops.test.txt
Z:\temp\9.test.txt =>> Z:\temp\9.oops.test.txt
Z:\temp\test.txt =>> Z:\temp\oops.test.txt
    10 files copied
 
[Z:\temp] dir /kmh *oops*
 6/27/2013  13:08              10  1.oops.test.txt
 6/27/2013  13:10              10  2.oops.test.txt
 6/27/2013  13:10              20  3.oops.test.txt
 6/27/2013  13:10              30  4.oops.test.txt
 6/27/2013  13:10              40  5.oops.test.txt
 6/27/2013  13:10              50  6.oops.test.txt
 6/27/2013  13:10              60  7.oops.test.txt
 6/27/2013  13:10              70  8.oops.test.txt
 6/27/2013  13:10              80  9.oops.test.txt
 6/27/2013  13:10              90  oops.test.txt
The first issue...
I originally thought the problem was that all the files were copied instead of just "test.txt". I then thought about it and realized that "(test)\.txt" is a substring of each filename and should therefore be included. This is when I saw the real issue: the name of the destination file should be "oops.test.txt" and all the files listed should be concatenated into that single file, however 10 destination files were created.

Next issue...
It appears to be trying to do be trying to do the concatenation and is prepending NUL bytes to the beginnings of the erroneously created files the size of the previous files, but skipping one.

HTML:
[Z:\temp] type *oops*
11111111
22222222
          33333333
                    44444444
                              55555555
                                        66666666
                                                  77777777
                                                            88888888
                                                                      99999999
                                                                                00000000
 
[Z:\temp] for %i in (*oops*) do (type %i | tpipe /simple=30 & echo.)
00000000  31 31 31 31  31 31 31 31  0D 0A                     11111111..
00000000  32 32 32 32  32 32 32 32  0D 0A                     22222222..
00000000  00 00 00 00  00 00 00 00  00 00 33 33  33 33 33 33  ..........333333
00000010  33 33 0D 0A                                         33..
00000000  00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00  ................
00000010  00 00 00 00  34 34 34 34  34 34 34 34  0D 0A        ....44444444..
00000000  00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00  ................
00000010  00 00 00 00  00 00 00 00  00 00 00 00  00 00 35 35  ..............55
00000020  35 35 35 35  35 35 0D 0A                            555555..
00000000  00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00  ................
00000010  00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00  ................
00000020  00 00 00 00  00 00 00 00  36 36 36 36  36 36 36 36  ........66666666
00000030  0D 0A                                               ..
00000000  00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00  ................
00000010  00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00  ................
00000020  00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00  ................
00000030  00 00 37 37  37 37 37 37  37 37 0D 0A               ..77777777..
00000000  00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00  ................
00000010  00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00  ................
00000020  00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00  ................
00000030  00 00 00 00  00 00 00 00  00 00 00 00  38 38 38 38  ............8888
00000040  38 38 38 38  0D 0A                                  8888..
00000000  00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00  ................
00000010  00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00  ................
00000020  00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00  ................
00000030  00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00  ................
00000040  00 00 00 00  00 00 39 39  39 39 39 39  39 39 0D 0A  ......99999999..
 
00000000  00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00  ................
00000010  00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00  ................
00000020  00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00  ................
00000030  00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00  ................
00000040  00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00  ................
00000050  30 30 30 30  30 30 30 30  0D 0A                     00000000..
Final issue kind of related to the previous two issues...
I modified the regex so that it included the entire filename of all the files by adding a ".*" to the beginning of the capture block, but it is still trying to do the concatenation with the prefixed NUL bytes.

HTML:
[Z:\temp] *copy ::(.*test)\.txt ::oops.\1.txt
Z:\temp\1.test.txt => Z:\temp\oops.1.test.txt
Z:\temp\2.test.txt =>> Z:\temp\oops.2.test.txt
Z:\temp\3.test.txt =>> Z:\temp\oops.3.test.txt
Z:\temp\4.test.txt =>> Z:\temp\oops.4.test.txt
Z:\temp\5.test.txt =>> Z:\temp\oops.5.test.txt
Z:\temp\6.test.txt =>> Z:\temp\oops.6.test.txt
Z:\temp\7.test.txt =>> Z:\temp\oops.7.test.txt
Z:\temp\8.test.txt =>> Z:\temp\oops.8.test.txt
Z:\temp\9.test.txt =>> Z:\temp\oops.9.test.txt
Z:\temp\test.txt =>> Z:\temp\oops.test.txt
    10 files copied
 
[Z:\temp] dir /kmh *oops*
 6/27/2013  13:08              10  oops.1.test.txt
 6/27/2013  13:32              10  oops.2.test.txt
 6/27/2013  13:32              20  oops.3.test.txt
 6/27/2013  13:32              30  oops.4.test.txt
 6/27/2013  13:32              40  oops.5.test.txt
 6/27/2013  13:32              50  oops.6.test.txt
 6/27/2013  13:32              60  oops.7.test.txt
 6/27/2013  13:32              70  oops.8.test.txt
 6/27/2013  13:32              80  oops.9.test.txt
 6/27/2013  13:32              90  oops.test.txt
 
#3
I think Rex means code for my plugin @XREPLACE which I sent to him. That seems to be doing the right thing.
Code:
v:\> echo %@xreplace[(test)\.txt,oops.\1.txt,1.test.txt]
1.oops.test.txt
v:\> echo %@xreplace[(.*test)\.txt,oops.\1.txt,1.test.txt]
oops.1.test.txt
I don't know if there's any problem with COPY.
 
#4
I think "copy ::(.*test)\.txt ::oops.\1.txt" is malformed. The destination filename matches the source regex. In simpler tests, I got "TCC: Contents lost before copy ... " which might give someone a clue as to where the NUL characters are coming from (I haven't figured it out).

Was any concatenation intended? What was intended?
 
#5
Maybe I've boiled this down, maybe not ...
We are accustomed to, and appreciate, this behavior:
Code:
v:\wtest> d
2013-06-29  02:02              12  1.test.txt
2013-06-29  02:02              12  2.test.txt
2013-06-29  02:02              12  3.test.txt
 
v:\wtest> copy *.test.txt *.oops.txt
V:\wtest\1.test.txt => V:\wtest\1.oops.txt
V:\wtest\2.test.txt => V:\wtest\2.oops.txt
V:\wtest\3.test.txt => V:\wtest\3.oops.txt
    3 files copied
 
v:\wtest> d
2013-06-29  02:02              12  1.oops.txt
2013-06-29  02:02              12  1.test.txt
2013-06-29  02:02              12  2.oops.txt
2013-06-29  02:02              12  2.test.txt
2013-06-29  02:02              12  3.oops.txt
2013-06-29  02:02              12  3.test.txt
What happened? The behavior we have come to expect happened ... **EACH** file matching the source wildcard was copied to another file where the text matched by "*" was duplicated in the destination name (as if in a DO or FOR loop).

Question 1: Can that be done with regexes and back-substitution?

I suppose I'd try:
Code:
v:\wtest> copy ::(\d)\.test\.txt ::\1.oops.txt
V:\wtest\1.test.txt => V:\wtest\1.oops.txt
V:\wtest\2.test.txt =>> V:\wtest\2.oops.txt
V:\wtest\3.test.txt =>> V:\wtest\3.oops.txt
    3 files copied
 
v:\wtest> d
2013-06-29  02:02              12  1.oops.txt
2013-06-29  02:02              12  1.test.txt
2013-06-29  02:58              12  2.oops.txt
2013-06-29  02:02              12  2.test.txt
2013-06-29  02:58              24  3.oops.txt
2013-06-29  02:02              12  3.test.txt
 
v:\wtest> type *oops*
1111111111
2222222222
            3333333333 (NUL chars in front)
But that does something different. I can't figure out what it did.

Question 2: What did it do?
 

rconn

Administrator
Staff member
May 14, 2008
10,788
97
#6
The problem with the erroneous attempted concatenation has been fixed in 15.01.52. The parser couldn't find a * or ? in the target, so it assumed subsequent files were attempted copies. I've changed it to also assume that *any* regex expression in the target is also a wildcard.

That leaves the question of what the backref should do with something like a "::(test)\.txt" source, though I'm inclined to believe the current behavior is correct.
 
#7
The problem with the erroneous attempted concatenation has been fixed in 15.01.52. The parser couldn't find a * or ? in the target, so it assumed subsequent files were attempted copies. I've changed it to also assume that *any* regex expression in the target is also a wildcard.

That leaves the question of what the backref should do with something like a "::(test)\.txt" source, though I'm inclined to believe the current behavior is correct.
I don't see a question. It would seem the backref should do what it always does. If I'm missing something important, please elaborate.

Maybe there's another question. Should *filename* matching with regexes be substring matching (status quo) or whole_string matching (as with wildcards)? Should the file "1.test.txt" be processed if the source regex is ::test\.* (parens are irrelevant here)? The user can circumvent substring matching with '^' and '$' (a little extra work plus remembering to do it).

Boil it down to ... is this the desired result?
Code:
v:\wtest> d ::"test.*"
2013-06-29  02:02              12  1.test.txt
2013-06-29  02:02              12  2.test.txt
2013-06-29  02:02              12  3.test.txt
2013-06-29  11:10              0  test.txt
My personal opinion is "no" but others may legitimately disagree.
 
Jan 19, 2011
581
10
Norman, OK
#8
Back from the weekend...

Was any concatenation intended? What was intended?
Concatenation was not intended.

For what was intended, I'm going to have to piece it together from memory. Details got confused while I was creating the examples in post #1.

I have a bunch of files. Mainly there are 5 files, but there are historical copies. For ease of demonstration, let's call them abcd.txt efg.txt hijk.txt lmnop.txt qrstuv.txt wxyz.txt. Each of the files is a list of files each from different directories. The historical files are named like YYYYMMDD.?????.txt (e.g. 20130630.abcd.txt).

Here's where it presented itself. I was manually copying abcd.txt to 20130630.abcd.txt using a regular expression as ...

copy ::(abcd)\.txt ::20130630.\1.txt

That way for the other files I could just up arrow and replace the name in parenthesis to the next filename and hit enter for the next file without typing it twice. I hit enter and it copied all the files containing the "abcd" (including the historicals) instead of just the one, which is when I discovered the weird NUL byte concatenation thing.