How to? TPIPE questions

Mar 5, 2022
3
0
Here's my version of "Dr. Strangepipe, or, How I'm Learning To Stop Worrying and Love TPIPE."

I've used JPSoft software since the 4DOS days. But my last employer's policy towards non-corporate software was strictly "George Orwell": Anything not specifically permitted was forbidden. So I was stuck with using CMD BAT files at my last job, and I've been away from Take Command for about 20 years. Now I'm back. I'm gradually remembering things. And then I encountered... TPIPE.

I've written a .BTM that processes two big .CSV files, eliminates unnecessary fields and arranges them so the resulting files' formats are consistent, then concatenates them, sorts them and eliminates duplicates.

I started out using FOR /f loops on the files, which goes line by line, writing only the needed fields, in the correct order, to another file. It worked, but it was very slow. At which point, I bit the bullet and decided to learn TPIPE.

After scratching my head about the TPIPE syntax for a couple of days, I printed out the help file. I numbered with red pencil the parameters for actions I needed, and things started to make sense. What I've come up with for my BTM works, but I think I might be able to do things better/more efficiently.

Here's a snip of the code in question:

Echo * Processing %MaxLines Global "Last Heard" records... :: Add underscores to blank fields tpipe /input=user_by_lh.csv /replace=0,0,0,0,0,0,0,0,0,";;",";__;" >temp$11.txt :: remove field 5 tpipe /input=temp$11.txt /selection=7,0,5,5,0,2,"",1 >temp$12.txt :: remove field 1 tpipe /input=temp$12.txt /selection=7,0,1,1,0,2,"",1 >temp$13.txt :: Move Field 1 to position 2, exchanging ID with callsign to match user.csv format tpipe /input=temp$13.txt /selection=10,0,1,1,2,2,"",1 >temp$14.txt :: Change semicolons to commas to match user.csv format tpipe /input=temp$14.txt /replace=0,0,0,0,0,0,0,0,0,";","," >temp$15.txt :: Remove header line, include only %MaxLines lines :: We didn't remove header earlier because the /selection statements use it! tpipe /input=temp$15.txt /head=1,0,1 >temp$16.txt tpipe /input=temp$16.txt /head=0,0,%MaxLines >temp$17.txt

The rest of the BTM concatenates the 2 files, sorts, eliminates dupes and adds header.

Here's a sample of the file I'm processing above (the elipsis just shows where I skipped (many) lines to show the progression

num;callsign;dmrid;name;age 1;DG5NEK;2628601;Heribert;0 2;F6ASS;2080924;Pascal;0 3;T77CD;2920016;Giovanni;0 ... 16481;W6JNG;3167861;Jing;6 16482;KC5FM;3140620;Lloyd A;6 ... 81999;ZS1TR;6551006;Trevor;365 82000;IU5JJV;2225251;Cafarro70;365 82001;BG7FCK;4607087;Jianyu;365

When all is done, the result looks like this.

RADIO_ID,CALLSIGN,FIRST_NAME 1023001,VE3THW,Wayne 1023003,VE3QC,Guy 1023008,VE3JMR,Mark


Here are several things I don't yet understand:

1. For some reason, I have not been able to combine multiple TPIPE actions in one statement, or use TPIPE in a pipe. Each time I've tried, I get nothing in the output. So I've written one "filter" or action on each line, writing each result to a temporary file. So I end up with a lot of temp files (which of course, I delete when finished). So, my question: Are there rules about what you can combine in a single TPIPE command, and how, that I've missed? Maybe show me an example or two?

2. I don't quite get how subfilters work and how to specify them. Again, an example or two would help.

3. TPIPE doesn't seem to handle empty fields in a comma or semicolon delimited file properly. When I attempt to delete a field, if a preceding field is empty, TPIPE appears to get confused and deletes nothing. See the first file snippet above--if the name field is blank, I end up with the number field as the third field rather than a blank field. So, for example:

65837;KG7DSB;3116414;;192
...would end up, after deleting fields 1 and 5, as this:
KG7DSB;3116414;192
...instead of this:
KG7DSB;3116414;;

I've gotten around this by first doing a /replace and adding a couple of underscores to the empty fields, replacing ";;" with ";__;". Is there a better way to deal with this?

Further Background, if you care: The input files are lists of amateur radio callsigns, their ID numbers for the DMR (Digital Mobile Radio) system, names, locations, and in one case, how long since the station was last heard on a particular network. We need such a list in our radios, with only IDs callsigns and first names. The problem is that there are about 215,000 users worldwide, and no radio can fit the entire global database anymore. My radio can only handle about 48000 records. So what I'm doing is pulling all the records for my state and its neighbors from the complete list, then adding as many records as will fit into the radio from the other file, which lists stations in order of "days since last heard." I end up with a smaller database that has a reasonable chance of containing most people I contact on the radio.

Thanks for any insights!
--Peter
 
May 20, 2008
11,845
120
Syracuse, NY, USA
As for piping, you just string them together.

Code:
v:\> type data.csv
num;callsign;dmrid;name;age;;foo
1;DG5NEK;2628601;Heribert;0;;foo
2;F6ASS;2080924;Pascal;0;;foo
3;T77CD;2920016;Giovanni;0;;foo

v:\> tpipe /input=data.csv /replace=0,0,0,0,0,0,0,0,0,";;",";__;"
num;callsign;dmrid;name;age;__;foo
1;DG5NEK;2628601;Heribert;0;__;foo
2;F6ASS;2080924;Pascal;0;__;foo
3;T77CD;2920016;Giovanni;0;__;foo

v:\> tpipe /input=data.csv /replace=0,0,0,0,0,0,0,0,0,";;",";__;" /selection=7,0,5,5,0,2,"",1
num;callsign;dmrid;name;__;foo
1;DG5NEK;2628601;Heribert;__;foo
2;F6ASS;2080924;Pascal;__;foo
3;T77CD;2920016;Giovanni;__;foo

v:\> tpipe /input=data.csv /replace=0,0,0,0,0,0,0,0,0,";;",";__;" /selection=7,0,5,5,0,2,"",1 /selection=7,0,1,1,0,2,"",1
callsign;dmrid;name;__;foo
DG5NEK;2628601;Heribert;__;foo
F6ASS;2080924;Pascal;__;foo
T77CD;2920016;Giovanni;__;foo

Instead of redirecting (>) the output, you can specify /output=... . For example,

Code:
v:\> tpipe /input=data.csv /output=processed.csv /replace=0,0,0,0,0,0,0,0,0,";;",";__;" /selection=7,0,5,5,0,2,"",1 /selection=7,0,1,1,0,2,"",1

v:\> type processed.csv
callsign;dmrid;name;__;foo
DG5NEK;2628601;Heribert;__;foo
F6ASS;2080924;Pascal;__;foo
T77CD;2920016;Giovanni;__;foo
 
May 20, 2008
11,845
120
Syracuse, NY, USA
65837;KG7DSB;3116414;;192
...would end up, after deleting fields 1 and 5, as this:
KG7DSB;3116414;192
...instead of this:
KG7DSB;3116414;;
It looks a little different here.

Code:
v:\> echo 65837;KG7DSB;3116414;;192 | tpipe /selection=7,0,5,5,0,2,"",1 /selection=7,0,1,1,0,2,"",1
KG7DSB;3116414;

And it seems correct.

1 = 65837
2 = KG7DSB
3 = 3116414
4 = [empty]
5 = 192

1 and 5 were removed and since 5 was the last one, its separating semicolon was removed also. The result has 3 fields, the third being empty.
 
Mar 5, 2022
3
0
Thanks for all this, Vince. Wading through all those integer parameters, I lost sight of /output.

Re. the last example: If I used a input big file, if I removed field 5 first, I ended up with things as in my example. with the number from field 5 ending up in the (final) field 3 instead of a blank field. If I removed field 1 first, and then removed field 4, formerly 5, things were as they should be.

If you want to fool with the actual file I use, it's here:
https://ham-digital.org/user_by_lh.php

This is what I came up with. I did end up putting an underscore in blank fields anyway, just for reassurance. And because it's truly needed in the CMD version I wrote using a FOR /f loop. First five /replaces change possible shell-confusing characters to harmless equivalents. %MaxLines is 42301

Code:
tpipe /input=user_by_lh.csv /output=Temp$Age.txt /head=0,0,%MaxLines^
/replace=0,0,0,0,0,0,0,0,0,"&","*"^
/replace=0,0,0,0,0,0,0,0,0,"+","*"^
/replace=0,0,0,0,0,0,0,0,0,"|",":"^
/replace=0,0,0,0,0,0,0,0,0,"<","("^
/replace=0,0,0,0,0,0,0,0,0,">",")"^
/replace=0,0,0,0,0,0,0,0,0,";;",";_;"^
/selection=7,0,1,1,0,2,"",1^
/selection=10,0,1,1,2,2,"",1^
/selection=7,0,4,4,0,2,"",1^
/replace=0,0,0,0,0,0,0,0,0,";",","^
/head=1,0,1
 
May 20, 2008
11,845
120
Syracuse, NY, USA
I don't know, Peter. Using your BTM and your data, both unaltered, I get these first and last five lines. They look OK to me. The output file contains no semicolons and no double commas.

Code:
v:\> head /n5 Temp$Age.txt & tail /n5 Temp$Age.txt
2623643,DL1OAD,Herbert
2222339,IZ2NBD,Luciano
3146368,W4WET,Ramiro G
2348248,G0BAK,Bill
2222004,IW2JXY,Mario
2221593,IZ1BAN,IZ1BAN
4601801,BG3LWV,hongwei
2288004,HB9WOF,Reto
4600718,BI4WPR,HUANG
2301145,OK1VBR,Radek

Several (357 to be exact) output lines look like this

Code:
2503255,RM3MM,_

but they seem to be expected, coming from input lines which look like this.

Code:
31188;RM3MM;2503255;;25

I get exactly three output lines that have an oddball field 3.

[/code]2343257,G0RVU,2 FANS
3020982,VA3TTQ,3020982
2347553,2E1XDJ,21 bush street[/code]

They come from these input lines.

Code:
4769;G0RVU;2343257;2 FANS;0
36693;VA3TTQ;3020982;3020982;38
37357;2E1XDJ;2347553;21 bush street;39

I don't know what you're seeing but what I'm seeing could be a case of GIGO?
 
May 20, 2008
11,845
120
Syracuse, NY, USA
One thing is for sure. "/head=0,0,%MaxLines" is screwing things up (at least they're screwing me up). Here are the line count with your BTM unaltered. [And why 42300?]

Code:
v:\> peter.btm

v:\> wc -l user_by_lh.csv
141646 user_by_lh.csv

v:\> wc -l Temp$Age.txt
42300 Temp$Age.txt

That's obviously wrong. Don't you want tp process the whole input file? Besides that, things look pretty good (see my previous post).

Here they are with the "head" removed.

Code:
v:\> peter.btm

v:\> wc -l user_by_lh.csv
141646 user_by_lh.csv

v:\> wc -l Temp$Age.txt
141645 Temp$Age.txt

That looks better.
 
May 20, 2008
11,845
120
Syracuse, NY, USA
These too. I don't know if they will screw up the processing of the CSV output file.

Code:
107309;OZ1DIS;2384278;OZ1DIS, Allan;717
128030;WX4ID;3147124;Evans,;1250
130671;WB3GXW;3124146;Creel,;1384
135646;KD9BCM;3117360;Callery,;1718
138792;N2LRR;3136330;Pavone,;2016
138999;KK6CJL;3107141;Mitchell,;2046
140509;W8VF;3139072;Haverstick,;2391
 
May 20, 2008
11,845
120
Syracuse, NY, USA
For what it's worth, it's a good idea to get into the habit of preceding a line-continuing '^' with a space. TCC simply removes the '^', the following CRLF, and any following whitespace. It works in your BTM because the next part of the command starts with '/', the parameter separator. In general, it doesn't work.

Code:
v:\> echo^
More?       foo
TCC: Unknown command "echofoo"
 
Mar 5, 2022
3
0
Thanks for all this, Vince. I was silent for the last few days because I was in the hospital (scheduled procedure), so I had other things to think about.

"Why 42300?" -- The radio can only hold about 48000 records. I load data from two databases. The first part of the script loads all digtitally-registered ham radio callsigns from my state and adjacent states. The second part uses a database where the calls are listed in order of "last heard on the network." 42300 is about how many additional records I can fit in, taking those callsigns and some "overlap" into account. So I get a combination of everyone who uses DMR (=Digital Mobile Radio) in my state, plus everyone who has been heard on the network fairly recently, stopping when I would run out of space in the radio.

The databases from which I'm getting all this holds imperfect user registrations from the dirty real world. There doesn't appear to be much (if any) validation of the data. Some hams from former Soviet republics don't put in their names, which means blank fields or a callsign duplicated in the name fields. Some people from countries that don't use the Latin alphabet put their names using their alphabet, which becomes random nonsense in the code page the database expects. None of this worries me, as long as fields from one bad record don't mess up the next. GIGO, as you say. This is about making a hobby convenient, not life-or-death.

I do want to make sure that my beginner knowledge of TPIPE isn't contributing to the mess.

Thanks for the tip regarding putting a space before the carets-as-line-extenders.

--Peter
 

Similar threads