TPipe /dup

#1
/dup=Type,MatchCase,StartColumn,Length,IncludeOne,Format

Remove or show duplicate lines. The arguments are:

Type:
0 - Remove duplicate lines
1 - Show duplicate lines

MatchCase - If 1, do case-sensitive comparisons

StartColumn - The starting column for comparisons

Length - The Length of the comparison

IncludeOne - Include lines with a count of 1

Format - how the output should be formatted for Type=1. For example, "%d %s" to show the count followed by the string.

=========================

1) Is starting column 0 or 1 based?

2) Not sure what IncludeOne is used for?

I am using "Type=0" so Format is not necessary.

========================
TCC 19.00.15 x64 Windows 7 [Version 6.1.7601]
 
#2
/dup=Type,MatchCase,StartColumn,Length,IncludeOne,Format

1) Is starting column 0 or 1 based?

2) Not sure what IncludeOne is used for?
tpipe gives me the creeps :rolleyes: but FWIW from the online TextPipe manual
Start column
The comparison can also ignore leading characters if desired by setting the start column higher than 1. This can be used to skip line numbers, which can be used to find duplicates that are not adjacent. To skip line numbers, set the Start Column to 6 (or so), and set the length to 4096, or a length greater than your maximum line length.
. . .
. . .
Include counts of 1
Normally this filter only outputs lines with counts of 2 or more (ie, they are duplicates). When this box is checked,
From this I'd infer that
  1. tpipe columns start at 1
  2. If IncludeOne is 1 then tpipe will also output for those lines with no duplicates [remembering that output for any lines is solely determined by the Format you cleverly choose] :smile:
 
#3
@Bob Chapman -

I am wanting to remove duplicate lines from a file - that contain email address - one per line.

So for lines that :
- only occur once - output that line.
- for lines that occur more then once - output the line only once.

the following is the BTM so far: (not long ...!)

Code:
goto :here

:here
  setlocal
    set fldr=c:\Users\Galloway\Desktop\EMailAddrs\
    rem next set is output file name for SORT /Output...
    set cSrtOut=SortOut.lst
    rem next is work file for email addresses
    set cOut=EmailOut.lst
    if not isdir "%fldr" md /s "%fldr"
    global /h /i /n /q GoSub DoFldr
  endlocal
  quit

:DoFldr
  echo In: %_CWD
  rem before processing current folder
  set nOutSize=%@filesize[%fldr%%cOut]
  rem extract email addresses
  for %fn in (*.eml *.lst *.txt) if ( "%fn" NE "%fldr%%cSrtOut") and ("%fn" NE "%fldr%%cOut") tpipe /input="%fn",0,1 /simple=28 /outputappend=1 /output=%fldr%%cOut
  rem if file has changed = means more email addresses found
  iff nOutSize != %@filesize[%fldr%%cOut] then
    rem make sure file exists
    iff isfile "%fldr%%cOut" then
      rem sort email addresses
      sort /rec 65535 "%%fldr%%cOut" /output=%fldr%%cSrtOut
      rem remove duplicate email addresses
      tpipe /input=%fldr%%cSrtOut /dup=Type,MatchCase,StartColumn,Length,IncludeOne,Format /output=%fldr%%cOut
      rem                                                 0,            0,               0, 65535, ??????????, ?????
    endiff
  endiff
  return
 
#4
I am wanting to remove duplicate lines from a file - that contain email address - one per line.
I can't get "Type=1" to output anything.

(Edit) The format string needs %% (percent signs doubled).
 
Last edited:
#5
#6
Seems that an example of specifying /input or /output when they are, for example:

tpipe /input="%fn" /simple=28 /outputappend=1 /output=%fldr%%cOut