textsort suggestion for tpipe

#1
VER: TCMD v5.01.52 x64

I have a file gt 2 billion bytes that I would like to sort. Tried DOS's SORT and that errored out. TPIPE doesn't have a /SORT option. I'd love to stay with JPSOFT products but also need to sort this huge file. Hence my suggestion to add a /SORT option to TPIPE or maybe another command?

Thank you ....
 
#2
IIRC, TPIPE's "sort" option was removed because it was buggy. But that was a long time ago and since sorting is rather important it would be nice if that option came back (in good working order).
 
#4
TCMD's MEMORY returns:


65 % Memory load
4,196,941,824 bytes total physical RAM
1,442,693,120 bytes available physical RAM
8,391,979,008 bytes total page file
4,308,668,416 bytes available page file
8,796,092,891,136 bytes total virtual RAM
8,795,816,751,104 bytes available virtual RAM
262,144 characters total alias
262,143 characters free
131,072 characters total function
131,071 characters free
500,000 characters total history

The file in question is:
10/07/2013 21:03 2,772,599,380 tcmd.all
 
#5
I just don't know enough about it to say whether that should be sufficient. Have you found a program that will sort it? If it's text, can you open it with WinWord?
 

Ugo

Aug 22, 2013
10
0
#6
VER: TCMD v5.01.52 x64
I have a file gt 2 billion bytes that I would like to sort.
...
I assume that you have a text file, and you want to sort its lines in ASCIIbetical order (alphabetical for characters, plus other ascii symbols).

One viable solution would be to get perl (for instance, the portable perl -no installation- at strawberryperl.com) and use this command:
perl -e "print sort <>" original_file.txt > sorted_file.txt

Note that this code loads the whole file in memory into a list of lines (the <> symbol just does that), sorts the list, and then print every line to the output.
The pipe redirects the output to a file.

I do not know if there is any limitation on the amount of data that can go into a DOS pipe. Neither I can tell you if the piping is efficient with large data (does it use a buffer?).
The bottleneck of this command is likely to be the input/output to the disk.

If you try the command and are happy with it, you can easily create an alias for your convenience, and use it like any other feature of TCMD!

As a slim alternative you can download the free tool "Swiss File Knife" and use its "sort" command. See http://stahlworks.com/dev/swiss-file-knife.html
Again, you can easily wrap this tool into an alias or a batch file.
 
#7
I assume that you have a text file, and you want to sort its lines in ASCIIbetical order (alphabetical for characters, plus other ascii symbols).

One viable solution would be to get perl (for instance, the portable perl -no installation- at strawberryperl.com) and use this command:
perl -e "print sort <>" original_file.txt > sorted_file.txt

Note that this code loads the whole file in memory into a list of lines (the <> symbol just does that), sorts the list, and then print every line to the output.
The pipe redirects the output to a file.

I do not know if there is any limitation on the amount of data that can go into a DOS pipe. Neither I can tell you if the piping is efficient with large data (does it use a buffer?).
The bottleneck of this command is likely to be the input/output to the disk.

If you try the command and are happy with it, you can easily create an alias for your convenience, and use it like any other feature of TCMD!

As a slim alternative you can download the free tool "Swiss File Knife" and use its "sort" command. See http://stahlworks.com/dev/swiss-file-knife.html
Again, you can easily wrap this tool into an alias or a batch file.
I don't understand how do use SFK. Does it work primarily with pipes? How would I sort input.txt and produce input_sorted.txt ?
 

Ugo

Aug 22, 2013
10
0
#8
I don't understand how do use SFK. Does it work primarily with pipes? How would I sort input.txt and produce input_sorted.txt ?
Try
sfk sort
and you'll get the help.
For your specific case just use
sfk filter input.txt +sort > input_sorted.txt
Beware that sfk assumes that every line is no longer than 4000 bytes, while the perl code above does not have this restriction.

You can also use powershell, if you like
Get-Content input.txt | Sort-Object > input_sorted.txt
Now you can easily include your powershell command inside your batch files by typing
PowerShell -Command "& {Get-Content input.txt | Sort-Object > input_sorted.txt}"
 

rconn

Administrator
Staff member
May 14, 2008
10,779
97
#9
IIRC, TPIPE's "sort" option was removed because it was buggy. But that was a long time ago and since sorting is rather important it would be nice if that option came back (in good working order).
It wasn't buggy, it was just slow with large files. That generated enough complaints that I removed the option.