Welcome!

By registering with us, you'll be able to discuss, share and private message with other members of our community.

SignUp Now!

Interesting TPIPE sort algorithm artifact

May
62
1
The TPIPE documentation states, "You can specify multiple filters, which will be processed in the order they appear on the command line." Oh, good: I can specify multiple sort fields. The only thing to look out for is that, since they're executed in order, the most minor sort field should be specified first; and the major sort field last.

So I tried it, and it works -- sort of. Each invocation of the sort does preserve the effect of the preceding sorts -- EXCEPT that the line order is exactly reversed each time.. So if, for example, I want to sort ascending on three different fields, I need to specify DESCENDING for the second sort. (Ascending is okay for the first sort, since its order will be swapped twice by the two subsequent sorts.)

Here's an example. You can experiment with additional passes, swapping the order of the sort fields and flipping the ascending/descending bit.
Code:
[D:\tmp]type foo.txt
2bX
1bX
2aX
1aX
2bX
1bX
2aX
1aX
2bO
1bO
2aO
1aO
2bO
1bO
2aO
1aO

[D:\tmp]tpipe /input=foo.txt /sort=0,0,0,1,1 /sort=0,1,0,2,1 /sort=0,0,0,3,1

Or am I perhaps making a grossly incorrect assumption that the sort algorithm is actually preserving anything from previous passes, and it just happens that all the cases I've tried so far seem to conform to the pattern I described? (My testing has been limited to files of under 1000 lines.)
 
Even for two forward ASCII sorts the results of the previous sort are not preserved.

Code:
v:\> type numbers.txt
195
194
193
192
191
190
189
188
187
186

v:\> tpipe /input=numbers.txt /sort=0,0,0,3,1
190
191
192
193
194
195
186
187
188
189

v:\> tpipe /input=numbers.txt /sort=0,0,0,3,1 /sort=0,0,0,2,1
189
188
187
186
195
194
193
192
191
190

If all the sorts are in the same direction (forward/reversed) you could make each subsequent sort have length one more than the previous sort. But then you might as well use one sort starting at the first character. If you're mixing forward and reverse searches, I think you can forget it.
 
Well, my actual use case was separate fields within the record, with each field having length > 1. And they were all intended to be ascending sorts. So, given this file of sales quantity, drummer, and city:
Code:
[D:\TMP]type foo.txt
111       Alice     Pittsburgh
222       Alice     Pittsburgh
333       Alice     Pittsburgh
111       Bob       Pittsburgh
222       Bob       Pittsburgh
333       Bob       Pittsburgh
111       Charlie   Pittsburgh
222       Charlie   Pittsburgh
333       Charlie   Pittsburgh
111       Alice     Chicago
222       Alice     Chicago
333       Alice     Chicago
111       Bob       Chicago
222       Bob       Chicago
333       Bob       Chicago
111       Charlie   Chicago
222       Charlie   Chicago
333       Charlie   Chicago
111       Alice     St. Louis
222       Alice     St. Louis
333       Alice     St. Louis
111       Bob       St. Louis
222       Bob       St. Louis
333       Bob       St. Louis
111       Charlie   St. Louis
222       Charlie   St. Louis
333       Charlie   St. Louis
I can apparently reliably sort ascending by city, sales qty, and name by doing this:
Code:
[D:\TMP]tpipe /input=foo.txt /sort=0,0,0,11,9 /sort=0,1,0,1,3 /sort=0,0,0,21,10
(Note descending sort on qty.)

Or descending on qty, ascending by name:
Code:
[D:\TMP]tpipe /input=foo.txt /sort=0,1,0,11,10 /sort=0,1,0,1,3
It appears to be consistent that, to get the ascending/descending results I want, I need to reverse the indicated sort-order in every other sort field, starting with the next-to-last sort field and working backwards. In other words, each pass reverses the order of the records put out by the prior pass before sorting on the new field.
 
I generated this file randomly and tried your method on it (sort by second column, then first (reversed), then third). Is the result what you expect? [I don't really what you are expecting].

[v\code]
v:\> type foo.txt
1 a y
2 a x
3 c y
3 b y
2 b z
1 b y
2 b x
2 b z
2 a z
2 b x
1 c y
3 b z
2 c z
3 c x
1 a z
1 c z
3 a z
3 b x
2 b x
1 a x

v:\> tpipe /input=foo.txt /sort=0,0,0,3,1 /sort=0,1,0,1,1 /sort=0,0,0,5,1
1 a x
2 a x
2 b x
2 b x
2 b x
3 b x
3 c x
1 a y
1 b y
1 c y
3 b y
3 c y
1 a z
1 c z
2 a z
2 b z
2 b z
2 c z
3 a z
3 b z[/code
 
Is the result what you expect?

Yes. In this case, ascending on all fields, from major through minor.

Depending on the algorithm used to implement a sort, it will usually either intentionally preserve the order of records that are sort-field equivalent or do something else that could appear random, but can usually be logically explained by studying the algorithm -- which, in this case, appears to be preservation of the reverse order. If you make a table in Excel, then sort it by, say, name, and then by city, the entries will be ordered ascending by city, then by name within city. The prior sort order is preserved within the later sort order.

My current use case within TCC was to be able to sort a list of files on user-specified criteria (date, name, size, location, etc.), retaining, say, order by name within the entries for a given date. My flopping of every other sort-order bit permits me to do that.
 
Got what you're looking for.

You need to sort from minor to major to keep the correct orders.
Also expand your "length of comparison" otherwise you're only sorting a local value and tpipe has no point of reference for the rest of the line.

EDIT: Looks like if you set the "length" of comparison to "0", it sorts to end of line and you don't have to keep track of how much to sort.

So... given input of foo.txt
Code:
D C E
E B F
G B D
C F C
E G D
A F E
G G F
B G A
C C G
C E C
A F D
C A D
B D B
C D B
F A B
A D A
E G B
D B C
C D A
E A C
F A E
A C D
D F E
G D A
G G D
E B D
A C F
C C F
A E F
B D A
D A G
G E F
A D G
A A A
B G F
B C E
E F B
G B B
B G B
B G C
E E E
G A C
F A B
E E C
A C D
D B A
G A C
G B E
D D F
C F C
A D A
G F A
B G B
C F C
B F D
C A D
C C C
A F F
A F F
C A E
G G F
A D E
F A B
E A G
E E C
G C A
A B A
A A C
A G F
B C C
E G F
F F F
F D B
C C F
D D E
C A F
G E G
D C A
A F E
E F D
E B A
G F D
E C F
G G A
D D F
D F B
D G B
G G E
G A C
D E G
E G E
E G F
E A A
C C D
C C C
C A C
B C A
F C E
C G E
B D G
Then sorting (col 5, col 3-5, col 1-5)...
Code:
13:56:01 $ tpipe /input=foo.txt /sort=0,0,0,5,1 /sort=0,0,0,3,3 /sort=0,0,0,1,5   <<< EDIT: ORIGINAL LINE
13:56:01 $ tpipe /input=foo.txt /sort=0,0,0,5,0 /sort=0,0,0,3,0 /sort=0,0,0,1,0   <<< EDIT: UPDATED LINE
A A A
A A C
A B A
A C D
A C D
A C F
A D A
A D A
A D E
A D G
A E F
A F D
A F E
A F E
A F F
A F F
A G F
B C A
B C C
B C E
B D A
B D B
B D G
B F D
B G A
B G B
B G B
B G C
B G F
C A C
C A D
C A D
C A E
C A F
C C C
C C C
C C D
C C F
C C F
C C G
C D A
C D B
C E C
C F C
C F C
C F C
C G E
D A G
D B A
D B C
D C A
D C E
D D E
D D F
D D F
D E G
D F B
D F E
D G B
E A A
E A C
E A G
E B A
E B D
E B F
E C F
E E C
E E C
E E E
E F B
E F D
E G B
E G D
E G E
E G F
E G F
F A B
F A B
F A B
F A E
F C E
F D B
F F F
G A C
G A C
G A C
G B B
G B D
G B E
G C A
G D A
G E F
G E G
G F A
G F D
G G A
G G D
G G E
G G F
G G F
Gives you what you want.
 
Last edited:
@rconn

Please update documentation for tpipe /sort to indicate that if the Length parameter is zero, then it continues to EOL.

e.g. Length - The length of comparison; if 0 compare to end of line
 
Gives you what you want.
Hmmm! I think @old coot wanted this kind of sort (third column most significant).

Code:
v:\> do i=0 to 7 ( echo %@format[03,%@eval[%i=b]])
000
001
010
011
100
101
110
111

v:\> (do i=0 to 7 ( echo %@format[03,%@eval[%i=b]])) | tpipe /sort=0,0,0,1,1 /sort=0,1,0,2,1 /sort=0,0,0,3,1
000
100
010
110
001
101
011
111
 
third column most significant

Well that's just bass ackwards.

That would put TCMD at version... hmm... lemme think...

28.02.18 - third column = 18
28.02.18 - second column = 02
28.02.18 - first column = 28

So that's 18.02.28... Hey! We're back in 2015!

Little endianness kinda works for memory management; not much else.

Rearrange the data columns and try again.

Looking at the "111 Alice Pittsburgh" data post where he says

I can apparently reliably sort ascending by city, sales qty, and name by doing this:

Then that puts the sort order as Column3, Column1, Column2.

*sigh* Rearrange the data columns and try again.
 
Well that's just bass ackwards.
:smile:
Not necessarily.
Say, for example, that I am given a mailing list.
I don't want to completely reformat it (if I did, I'd probably use AWK).
But the order of the entries is entirely random.
And the field order is:
Firstname,Mid,Lastname,Address,City,State,Country,PostalCode

So, by your logic, I should want these in a strict ASCII order?

Nope, it's not the logic that's bass ackwards, it's the input data format, based on the fact that the column ordering is the order things would go on an envelope.

Perhaps I want the same order of fields but I just want the file in the following order, major to minor:
  • Country
  • State
  • City
  • Lastname
  • Firstname
  • Mid
Doesn't seem bass ackwards to me at all.

Just my $0.02 worth.
 
@JohnQSmith:
From post #7 above:
Please update documentation for tpipe /sort to indicate that if the Length parameter is zero, then it continues to EOL.

That is not true. Go back to the data I gave in post #3. Now sort it these two ways:
Code:
tpipe /input=foo.txt /sort=0,0,0,1,30
tpipe /input=foo.txt /sort=0,0,0,1,0
The first one, as expected, sorts the lines in order based on the entire line contents. The second one (zero length, your proposal) changes the order of the lines, but in what seems to be an unpredictable way (try it yourself):
  • Major, by city (columns 21-30), but in a strange order.
  • Intermediate, by name (columns 11-20), descending.
  • Minor, by sales (columns 1-10), descending.
Code:
333       Charlie   St. Louis
222       Charlie   St. Louis
111       Charlie   St. Louis
333       Bob       St. Louis
222       Bob       St. Louis
111       Bob       St. Louis
333       Alice     St. Louis
222       Alice     St. Louis
111       Alice     St. Louis
333       Charlie   Chicago
222       Charlie   Chicago
111       Charlie   Chicago
333       Bob       Chicago
222       Bob       Chicago
111       Bob       Chicago
333       Alice     Chicago
222       Alice     Chicago
111       Alice     Chicago
333       Charlie   Pittsburgh
222       Charlie   Pittsburgh
111       Charlie   Pittsburgh
333       Bob       Pittsburgh
222       Bob       Pittsburgh
111       Bob       Pittsburgh
333       Alice     Pittsburgh
222       Alice     Pittsburgh
111       Alice     Pittsburgh
@rconn, I would love to see an explanation for that.

From your post #6 above:
13:56:01 $ tpipe /input=foo.txt /sort=0,0,0,5,0 /sort=0,0,0,3,0 /sort=0,0,0,1,0 <<< EDIT: UPDATED LINE
If that worked (but it probably doesn't; see above), I could accomplish the same thing with a single sort: tpipe /input=foo.txt /sort=0,0,0,1,5 but, as Vince and @forbin said, that's not what I want. The data is (are?) in multiple columns, and I want to be able to sort it by user-requestable columns without rearranging the order of the data on each line -- which sometimes (most times) implies a sort not in the order the fields occur in the records.
 
As for length = 0 ... the first 0 chars of string1 are always identical to the first 0 chars of string2. What comes out (compared to what goes in) will depend on the sort algorithm itself.

Experiment shows that for TPIPE's sort, using length 0 (regardless of start position) will always (?) reverse the order of the lines (a "feature" of the algorithm that was observed earlier).

1640543050310.png
 
Last edited:
Duh. Didn't think to compare the output to the original input file -- I was too caught up in the apparent weird sort order. Thanks, Vince.
 

Similar threads

Back
Top