TPIPE command - Text filtering, search, substitution, and conversion
Purpose:Text filtering, search, substitution, conversion, and sorting

 

Format:See Options below.

 

Usage:

 

TPIPE does text filtering, substitution, conversion, and sorting on text files. If you don't specify an input filename, TPIPE will read from standard input if it has been redirected. If you don't specify an output filename, TPIPE will write to standard output. This is substantially slower than reading from and writing to files, but allows you to use TPIPE with pipes.

 

You can specify multiple filters, which will be processed in the order they appear on the command line. Do not insert any unquoted whitespace or switch characters in the arguments to an option. If you do need to pass whitespace, switch characters, or double quotes in an argument, you can quote the entire option in single back quotes.

 

Row and column positions start at 1.

 

If you need to process a Windows Unicode UTF-16 file, unless the filter supports Unicode directly (for example, /simple) you'll need to convert it to UTF-8 first (see /unicode=...).

 

Options:

 

 /input=filename[,subfolders[,action]]

 

filename - Filename or folder to read (including wildcards). This can be either a disk file, file list (@filename), or CLIP:. If it is not specified, TPIPE will read from standard input.

subfolders - How many subfolders to include (default 0):

0 - no subfolders

1 to 255 - subfolder(s)

255 - all subfolders

action - the action to take (default 1):

1 - include the files

2 - exclude the files

3 - ignore the files

 

You can specify multiple /input statements.

 

/output=filename

 

Filename to write. This can be either a disk file or CLIP:. If it is not specified, TPIPE will write to standard output.

 

/outputfolder=directory

 

Set the output filter directory.

/inputbinary=n[,size]

Determines how binary files are processed. The options are:

0 - Binary files are processed (default)

1 - Binary files are skipped

2 - Binary files are confirmed before processing

size - The sample size in bytes to use for identifying binary files (default 255)

/inputdelete=n

If 1, the input files will be deleted after processing. USE WITH CAUTION!

/inputprompt=n

If 1, TPIPE will prompt before processing each input files.

/inputpromptRO=n

If 1, TPIPE will prompt before processing read-only input files.

/inputstring=...

Process the string as if it were a file and return the result.

/outputappend=n

If n is 1, append to the output file.

/outputchanged=n

Sets the output changed mode. The options are:

0 - Always output

1 - Only output modified files

2 - Delete original if modified

/outputmode=n

If n=1, TPIPE will open each output file in its associated program upon completion. If there is no association for a file, it will be opened in the default editor.

0 - Output to clipboard (all files are merged)

1 - Output to files

2 - Output to a single merged file

/outputopen=n

If 1, TPIPE will open each output file in its associated program upon completion.

/outputretaindate=n

 

If n is 1, retain the existing file date for the output file.

 

/clipboard

 

Runs the current filter with input from and output to the clipboard.

 

/filter=filename

 

Name of filter file to load (see /save=filename)

 

/save=filename

 

Saves the filter settings defined on the command line to the specified filename, and returns without executing any filters.

 

/startsubfilters

 

The following filters are created as sub filters, until the closing /ENDSUBFILTERS. Sub filters allow a restricted part of the entire text to be operated on by a group of filters without effecting the entire text. For example, a "Restrict to delimited fields" (CSV, Tab, Pipe, etc.) filter can pick out a range of CSV fields, and then a search/replace filter can operate JUST on the text restricted.

 

/endsubfilters

 

End the sub filters defined by the preceding /STARTSUBFILTERS.

 

/buffersize=n

 

Sets the buffer size for the preceding search/replace filter. (The default is 4096.)

 

/editdistance=n

 

Sets the edit distance threshhold for the preceding search/replace filter. (The default is 2.)

 

/comment=text

 

Add a comment to a filter file.

 

 Text - Comment to add

 

/database=Mode,GenerateHeader,Timeout,Connection,InsertTable,FieldDelimiter,Qualifier

 

Adds a database-type filter.

 

Mode

0 Delimited output

1 Fixed width

2 XML

3 Insert script

 

GenerateHeader - Generates header information when True.

 

Timeout - SQL command timeout in seconds.

 

ConnectionStr - The database connection string.

 

InsertTable - The name of the insert table.

 

FieldDelimiter - The string to use between columns.

 

Qualifier - The string to use around string column values.

 

/dup=Type,MatchCase,StartColumn,Length,IncludeOne,Format

 

Remove or show duplicate lines. The arguments are:

 

Type:

0 - Remove duplicate lines

1 - Show duplicate lines

 

MatchCase - If 1, do case-sensitive comparisons.

 

StartColumn - The starting column for comparisons (the first column is 1).

 

Length - The Length of the comparison.

 

IncludeOne - Include lines with a count of 1 (only for Type 1).

 

Format - how the output should be formatted for Type=1.  Format strings are composed of plain text and format specifiers. Plain text characters are copied as-is to the resulting string.

Format specifiers have the following form:

%[index ":"][-][width][.precision]type

An optional argument index specifier

An optional left justification indicator, ["-"]

An optional width specifier, [width] (an integer). If the width of the number  is less than the width specifier, it will be padded with spaces.

An optional precision specifier [precision] (an integer). If the width of the number is less than the precision, it will be left padded with 0's.

The conversion type character:

d - decimal

s - string

Percent signs in the format string should be doubled (unless you back quote the /dup=`...` argument), and the count argument must appear before the string (unless you use the index specifier). For example, "%%d %%s" shows the count followed by the string. Or, with the index specifier, "string %%1:s count %%0:d".

 

/eol=input,output,length

 

Add an EOL (end of line) conversion filter. The arguments are:

 

input:

0 - Unix (LF)

1 - Mac (CR)

2 - Windows (CR/LF)

3 - Auto

If you are unsure of the source, select Auto. The Auto option can detect and modify text files containing a variety of line endings.

4 - Fixed (use the length parameter to specify the length)

If you are converting a mainframe file that contains fixed length records, select "Fixed length" and enter the record length. The maximum record length is 2,147,483,647 characters. Note: If you are converting 132 column mainframe reports, you should set the fixed length to 133, because each line has a prefix character.

 

output:

0 - Unix

1 - Mac

2 - Windows

3 - None

 

length - The line length to use if input=4

 

/file=type,MatchCase,filename

 

Add a file-type filter. The arguments are:

 

type:

17 Restrict to filenames matching the Perl pattern

18 Restrict to filenames NOT matching the Perl pattern

 

MatchCase - If 1, do a case sensitive match (where appropriate)

 

filename - the filename to use

 

/grep=Type,IncludeLineNumbers,IncludeFilename,MatchCase,CountMatches,PatternType,UTF8,IgnoreEmpty,Pattern

 

Adds a Grep type line based filter. The arguments are:

 

Type:

0 Restrict lines matching (for subfilters)

1 Restrict lines NOT matching (for subfilters)

2 Extract matches

3 Extract matching lines (grep)

4 Extract non-matching lines (inverse grep)

5 Remove matching lines

6 Remove non-matching lines

 

IncludeLineNumbers - 1 to include the line number where the pattern was found

 

IncludeFilename - 1 to include the filename where the pattern was found

 

MatchCase - 1 to do a case-sensitive comparison when matching the pattern

 

CountMatches - 1 to output a count of the number of matches

 

PatternType

0 Perl pattern

1 Egrep pattern

2 Brief pattern

3 MS Word pattern

 

UTF8 - 1 to allow matching Unicode UTF8 characters

 

IgnoreEmpty - 1 to ignore empty matches

 

Pattern - the (regular expression) pattern to match

 

/head=Exclude,LinesOrBytes,Count

 

Add a head type filter (includes or excludes text at the beginning of the file). The arguments are:

 

Exclude:

0 - Include the text

1 - Exclude the text

 

LinesOrBytes:

0 - Measure in lines

1 - Measure in bytes

 

Count - the number of lines or bytes to include or exclude

 

/insert=type,position,string

 

Add an insert type filter. The arguments are:

 

type:

0 - Insert column

Inserts a new column of text. The position the text is inserted is determined by a column count. The leftmost column is column 1 – inserting in this column displaces all other text to the right. If the insert column given is 0, the text is inserted at the end of the line. If the insert column is negative, the text is inserted at the given position relative to the end of the line. If the insert column given is before the start of the line, or beyond the end of the line, then the text is prepended or appended to the line respectively. Note - this filter is designed for ANSI or Unicode UTF-8 data - it will not handle UTF-16 data. If you need to process UTF-16 files, convert them to UTF-8 first and then convert back to UTF-8 after doing the insertion.

1 - Insert bytes

Insert bytes at the given offset (from 0 to the size of the file).

 

position - the position to insert the string

 

string - the string to insert

 

/line=StartNumber,Increment,SkipBlank,DontNumberBlank,NumberFormat[,DontReset]

 

Adds a Line Number filter. The arguments are:

 

StartNumber - the starting line number

 

Increment - the amount to add for each new line number

 

SkipBlankIncrement  - don't increase the line number for blank lines

 

DontNumberBlank - don't put a line number on blank lines

 

NumberFormat - The format to use for the line number. The format syntax is:

[-][width][.precision]d

An optional left justification indicator, ["-"]

An optional width specifier, [width] (an integer). If the width of the number  is less than the width specifier, it will be padded with spaces.

An optional precision specifier [precision] (an integer). If the width of the number is less than the precision, it will be left padded with 0's.

The conversion type character:

d - decimal

 

DontReset - if 1, do not reset the line count at the end of the file. The default is 0.

 

/log=Filename

 

Log the TPIPE actions.

 

 Filename - Name of log file

 

/logappend=n

 

If n is 1, append to the log file.

 

/maths=operation,operand

 

Adds a maths type filter.

 

operation - the operation to perform

0        +

1        -

2        *

3        div (the remainder is ignored)

4        mod (the remainder after division)

5        xor

6        and

7        or

8        not

9        shift left (0 inserted)

10        shift right (0 inserted)

11        rotate left

12        rotate right

 

operand - the operand to use

 

/merge=type,filename

 

Adds a merge type filter (merge into single output filename). The arguments are:

 

type:

0 Merge into filename

1 Retain lines found in filename

2 Remove lines found in filename

3 Link filter filename

 

filename - the filename to use

 

/number=type,value

 

Add a number-type filter. The arguments are:

 

type:

0 Convert Tabs to Spaces

1 Convert Spaces to Tabs

2 Word wrap (value column width)

3 Pad to width of value

4 Center in width of value

5 Right justify in width of value

6 Restrict CSV field to value

7 Restrict tab-delimited field to value

8 Truncate to width value

9 Force to width value

10 Repeat file value times

11 Restrict to blocks of length

12 Expand packed decimal (with implied decimals)

13 Expand zoned decimal (with implied decimals)

14 Expand unsigned (even-length) packed decimal

15 Expand unsigned (odd-length) packed decimal

 

Value - the numeric value to use

 

/perl=BufferSize,Greedy,AllowComments,DotMatchesNewLines

 

Sets the Perl matching options for the immediately preceding search/replace filter.

 

BufferSize - The maximum buffer size to use for matches. Any match must fit into this buffer, so if you want to match larger pieces of text, increase the size of this buffer to suit. Default is 4096.

 

Greedy - If the pattern finds the longest match (greedy) or the shortest match. Default is false.

 

AllowComments - Allow comments in the Perl pattern. Default is false.

 

DotMatchesNewLines - Allow the '.' operator to match all characters, including new lines. Default is true.

 

/replace=Type,MatchCase,WholeWord,CaseReplace,PromptOnReplace,Extract,FirstOnly,SkipPromptIdentical,Action,SearchStr,ReplaceStr

 

Adds a search and replace (find and replace) filter. Search / Replace lists discard blank search terms and terms where the replacement is identical to the search. Search / Replace lists can generate log entries (useful for debugging). Logs can optionally be output only for where replacements occurred.

 

The arguments are:

 

Type:

0 Replace

1 Pattern (old style)

2 Sounds like

3 Edit distance

4 Perl pattern

5 Brief pattern

6 Word pattern

 

MatchCase - Matches case when set to 1, ignores case when set to 0

 

WholeWord - Matches whole words only when set to 1

 

CaseReplace - Replaces with matching case when set to 1

 

PromptOnReplace - Prompts before replacing when set to 1

 

Extract - If 1, all non-matching text is discarded

 

FirstOnly - If 1, only replace the first occurrence

 

SkipPromptIdentical - If 1, don't bother prompting if the replacement text is identical to the original.

 

Action - the action to perform when found:

0 replace

1 remove

2 send to subfilter

3 send non-matching to subfilter

4 send subpattern 1 to subfilter etc

 

SearchStr - the string to search for

 

ReplaceStr - the string to replace it with

 

/replacelist=Type,MatchCase,WholeWord,CaseReplace,PromptOnReplace,FirstOnly,SkipPromptIdentical,Simultaneous,LongestFirst,Filename

 

Add a search and replace list, using search and replace pairs from the specified file.

 

Type:

0 Replace

1 Pattern (old style)

2 Sounds like

3 Edit distance

4 Perl pattern

5 Brief pattern

6 Word pattern

MatchCase - Matches case when set to 1, ignores case when set to 0

 

WholeWord - Matches whole words only when set to 1

 

CaseReplace - Replaces with matching case when set to 1

 

PromptOnReplace - Prompts before replacing when set to 1

 

FirstOnly - If 1, only replace the first occurrence

 

SkipPromptIdentical - If 1, don't bother prompting if the replacement text is identical to the original.

 

Simultaneous - If 1, all search strings are scanned for simultaneously instead of consecutively. (This is useful if the search strings and results strings overlap.)

 

LongestFirst - If 1, searches for long phrases (most specific) before short phrases (least specific) - this is generally used for translations.

 

Filename - The file to load search/replace pairs from. If the file extension is .XLS or .XLSX, the file is assumed to be Excel format, if the extension is .TAB the file is assumed to have tab-delimited values, and any other extension (including .CSV) is assumed to have Comma-Separated Values. The filename can contain environment variables enclosed in % signs e.g. %TEMP%\myfile.txt. TPIPE corrects any doubled backslashes.

 

/run=InputFileName,OutputFileName,"CommandLine"

 

Adds a Run External Program filter. The arguments are:

 

InputFilename - the filename that TextPipe should read from after the External Program writes to it.

OutputFilename - the filename that TextPipe should write to for the External Program to read in.

CommandLine - the command line of the program to run. Should include double quotes around the entire command line.

 

/script=language,timeout,code

 

Adds an ActiveX script filter.

 

language: The language of the script

 

timeout:  The command timeout in seconds

 

script:  The code

 

/selection=Type,Locate,Param1,Param2,MoveTo,nDelimiter,CustomDelimiter,HasHeader[,ProcessIndividually[,ExcludeDelimiter[,ExcludeQuotes]]]

 

Type - The type of filter to add:

0 – Remove column:

This filter is used to remove columns of text, given a column specification that describes the position of the column relative to the start or end of the line, and the width of the column. There are several ways to specify the columns (Locate,Param1,Param2) to remove:

0 - Start column, End column. This removes all text including and between the specified columns. Useful for removing column in fixed width data files.

1 - Start column, width. Removes Width characters starting from (and including) column Start.

2 - End column, width. Removes Width characters backwards starting from (and including) column End.

3 - Start column to end of line. Removes all characters from the Start column to the very end of the line. Useful for making a file a uniform width.

4 - Width to end of line. Removes Width characters backwards starting from (and including) the last column.

Note - if you are removing more than one column range, it is easiest to remove ranges from right-to-left so that the position of the columns doesn't change.

1 - Restrict lines (restriction filters require sub filters to have any effect)

2 - Restrict columns (restriction filters require sub filters to have any effect)

3 - Restrict to bytes (restriction filters require sub filters to have any effect)

4 - Restrict to delimited fields (CSV, Tab, Pipe, etc.)

6 – Remove lines:

This filter removes a range of lines. There are several ways to specify the lines (Locate,Param1,Param2) to remove:

0 - Start line, End line. This removes all lines including and between the specified lines.

1 - Start line, width. Removes Width lines starting from (and including) line Start.

2 - End line, width. Removes Width lines backwards starting from (and including) line End.

3 - Start line to end of line. Removes all lines from the Start line to the very end of the line.

4 - Width to end of line. Removes Width lines backwards starting from (and including) the last line.

7 – Remove delimited fields (CSV, Tab, Pipe, etc.):

This filter is used to remove fields delimited by a given character. You can choose a predefined delimiter character (Delimiter), or select your own (CustomDelimiter). The trailing delimiter (if any) is also removed. When Comma (.csv) is chosen, TPIPE automatically handles single and double quoted strings, with embedded line feeds.

First Row Contains Field Names

If the first line of the file contains Field Names, set HasHeader to 1 so that TPIPE can count how many fields are expected. It can then determine if a field has embedded CR/LF characters and spans multiple lines. TPIPE can also determine this without a header if the fields are properly double-quoted - TPIPE will notice the missing double quote and continue reading the record from the following line.

Remove Fields

There are several ways to specify the fields (Locate,Param1,Param2) to remove:

0 - Start field, end field. This removes all text including and between the specified fields.

1 - Start field, width. Removes Width fields starting from (and including) field Start.

2 - End field, width. Removes Width fields backwards starting from (and including) field End.

3 - Start field to end of line. Removes all fields from the Start field to the very end of the line.

4 - Width to end of line. Removes Width fields backwards starting from (and including) the last field.

Note - if you are removing more than one field range, it is easiest to remove ranges from right-to-left so that the position of the fields doesn't change.

9 – Move columns:

TPIPE will move columns to a new position on the line. The new position (MoveTo) is specified assuming that the moved columns have been removed from the line.

10 – Move delimited fields (CSV, Tab, Pipe, etc.):

TPIPE will move CSV-delimited fields to a new position on the line. The new position (MoveTo) is specified assuming that the moved fields have been removed from the line. TPIPE ensures that all the delimiters on the line are correctly maintained, both at the end of the line and where the moved fields are inserted. Note - this filter is designed for ANSI or Unicode UTF-8 data - it will not handle UTF-16 data. You will need to convert UTF-16 files to UTF-8 first, do the selection, and then convert back to UTF-16.

12 – Copy columns:

TPIPE will copy columns to a new position (MoveTo) on the line. Note - this filter is designed for ANSI or Unicode UTF-8 data - it will not handle UTF-16 data. You will need to convert UTF-16 files to UTF-8 first, do the selection, and then convert back to UTF-16.

13 – Copy delimited fields (CSV, Tab, Pipe, etc.):

TPIPE will copy CSV-delimited fields to a new position (MoveTo) on the line. TPIPE ensures that all the delimiters on the line are correctly maintained, both at the end of the line and where the copied fields are inserted. Note - this filter is designed for ANSI or Unicode UTF-8 data - it will not handle UTF-16 data. You will need to convert UTF-16 files to UTF-8 first, do the selection, and then convert back to UTF-16.

17 – Remove byte range:

This filter is used to remove a range of bytes. There are several different ways to specify the bytes (Locate,Param1,Param2) to remove:

0 - Start byte, end byte. This removes all text including and between the specified byte.

1 - Start byte, width. Removes Width byte starting from (and including) the start byte.

2 - End byte, width. Removes Width fields backwards starting from (and including) byte End.

3 - Start byte to end of file. Removes all fields from the Start byte to the very end of the file.

4 - Width to end of file. Removes Width fields backwards starting from (and including) the last byte.

Note - if you are removing more than one byte range, it is easiest to remove ranges from right-to-left so that the position of the bytes doesn't change.

 

Locate - How to determine which areas to affect:

0 - Restrict %d .. %d

1 - Restrict %1:d starting at %0:d

2 - Restrict %1:d starting at END - %0:d

3 - Restrict %d .. END - %d

4 - Restrict END - %d .. END - %d

 

Param1, Param2 - The integer values for the Locate method.

 

MoveTo - The integer value where to move or copy the columns or fields to (first columns or field is 1).

 

Delimiter - The index of the standard delimiter to use:

0 - Comma

1 - Tab

2 - Semicolon

3 - Pipe (|)

4 - Space

5 - Custom

 

CustomDelimiter - The custom delimiter to use (if Delimiter == 5). This should be a quoted string; if you are not using a custom delimiter then set this field to "".

 

HasHeader - 1 if the file's first row is a header row, 0 if not.

 

ProcessIndividually - Whether to apply sub filters to each CSV or Tab field individually (1), or to the fields as one string value (0). The default is false.

 

ExcludeDelimiter - Whether or not to apply subfilters to each CSV or Tab field individually, or to the fields as one string value. Defaults to 0.

 

ExcludeQuotes - Whether or not to include the CSV quotes that may surround the field when passing the field to the subfilter. Defaults to 1.

 

/simple=n[u]

 

Adds a simple filter type. n is the type of filter to add, and for those filters that support it, u indicates that the filter will be dealing with Unicode data.

 

1 – Convert ASCII to EBCDIC

EBCDIC is the character collating sequence commonly used on mainframes. Some characters cannot be converted because they exist in one character set but not the other.

2 – Convert EBCDIC to ASCII

3 – Convert ANSI to OEM

Converts from ANSI to ASCII/OEM. ANSI is an 8-bit character set used by Windows, and it includes all accentuated Roman characters used by non-English languages like French, German and Spanish. (Windows uses UTF-16LE for all of its internal APIs, and converts to ANSI if the user is using raster fonts or ANSI files.) ASCII/OEM is an extension of the original IBM character set where various non-essential characters are replaced by language-specific accentuated characters. Different ASCII/OEM character sets are not compatible. They must be converted to ANSI and then back to the correct ASCII/OEM character set to be readable.

4 – Convert OEM to ANSI

5 – Convert to UPPERCASE

Forces all text to UPPERCASE. To make the conversion, the function uses the current language selected by the user in the system Control Panel. If no language has been selected, TPIPE uses the Windows internal default mapping.

6 – Convert to lowercase

Forces all text to lowercase. To make the conversion, the function uses the current language selected by the user in the system Control Panel. If no language has been selected, TPIPE uses the Windows internal default mapping.

7 – Convert to Title Case

Converts all text to Title Case -- i.e., the first letter of every word is capitalized, and all other letters are forced to lower case. This routine calculates a table of upper and lower case letters on TPIPE startup, and this determination is based on the semantics of the language selected in Control Panel.

8 – Convert to Sentence Case

Converts all text to Sentence case ie the first word in every sentence is capitalized, all other letters are left as is. Sentences start after periods, exclamation marks, colons, question marks, quotes, parentheses and angle brackets (.!:?'"<().

9 – Convert to tOGGLE cASE

tOGGLES tHE cASE of all text -- i.ee, all UPPERCASE characters are converted to lowercase and vice-versa.

10 – Remove blank lines

Removes blank lines. Note, lines with spaces or tabs are not removed. Use the Remove Blanks From Start Of Line filters first to rectify this.

11 – Remove blanks from End of Line

Removes spaces and tabs from the end of every line.

12 – Remove blanks from Start of Line

Removes spaces and tabs from the start of every line.

13 – Remove binary characters

Removes binary characters such as those higher than ASCII code 127, and those less than ASCII code 32 except for carriage returns (ASCII code 13) and line feeds (ASCII code 10).

This filter if very useful if you have a corrupted text file, or if you just want to see what text is inside a binary file. The binary information is removed, leaving you with just the text.

14 – Remove ANSI codes

ANSI (American National Standards Institute) codes are included in various streams of information, to provide a remote computer with control over cursor positioning, text attributes, etc. They are also used in connections between minicomputers and mainframe computers and the terminals connected to them.

The need to use an ANSI filter can be recognized when something like the following example shows up in a file viewed in a text editor:

<[0;1;4mas<[m - MC88000 assembler

In this example, the "as" near the beginning is displayed in a different color than the rest of the line when the ANSI codes are properly processed. The Escape (ASCII 27) codes above have been replaced by the < symbol to make this line printable.

The Remove ANSI Escape Sequences filter can be used to filter out these codes and "clean up" the text so that it can be used in standard fashions such as copying and pasting into a word processor. On Unix machines the man (manual) help utility will only allow page-by-page browsing through a file in a forward direction. By piping the man output to a text file, transferring it to a DOS machine, and running it through the Remove ANSI Escape Sequences filter (and the Convert EOL filter - Unix to DOS if desired), a standard DOS editor can be used for browsing through the file, quoting from it, etc.

15 – Convert IBM drawing characters

IBM drawing characters in the upper ASCII range (128-255) are commonly used to draw lines and boxes, single and double line borders, shaded characters etc. Many devices (such as printers, non-IBM computers etc.) do not support the display of these characters.

This filter converts them to standard ASCII characters (+, - and |) that all computers can display.

16 – Remove HTML and SGML

Use this filter to convert HTML documents to a readable format. This filter removes HTML and XML markup tags i.e. everything including and between <> brackets.

17 – Remove backspaces

Remove backspaces, i.e. all ASCII code 8's.

18 – Resolve backspaces

Resolve backspaces -- i.e., remove both the backspaces and the characters prior to the backspaces that would have been deleted.

19 – Remove multiple whitespace

Removes sequences of multiple spaces or tabs and replaces them with a single space.

20 – UUEncode

Usually used for transmitting binary files inside an email. Files of this type are usually given an extension of .uue. Warning – UUencoded text may be corrupted when passing over a mainframe mail gateway. To avoid corruption, use Mime Base 64 or XXEncode.

21 – Hex Encode

A very simple encoding of a file. Usually used for small files, because it uses a large amount of space. The benefit is that the file is very easy to encode/decode, and the file cannot be corrupted passing through mail gateways.

22 – Hex Decode

Converts a file from its hex representation back to binary. The file to be decoded MUST NOT have any extra characters at the start or end if it is to be successfully processed.

23 – MIME Encode (Base 64)

Used for binary data. Files of this type are usually given an extension of .b64.

24 – MIME Decode (Base 64)

Used for binary data. Files of this type are usually given an extension of .b64. The file to be decoded MUST NOT have any extra characters at the start or end if it is to be successfully processed.

25 – MIME Encode (Quoted printable)

Quoted printable is used for text that is mainly readable, but may contain special characters with accents etc.

26 – MIME Decode (Quoted printable)

The inverse of the above encoding.

27 – UUDecode

Mail attachments can be uuencoded, use this filter to convert the file back to its correct form. Files of this type are usually given an extension of .uue.

28 – Extract email addresses

Extract email addresses. This filter searches for email addresses of the form user@server.domain, and writes them out one per line (using a DOS line feed, CR/LF). Usually this filter is followed by a filter to remove duplicate lines, and then by a Search and Replace filter, searching for \013\010 and replacing with a comma or semi-colon, depending on the email address separator used by your email software.

29 – Unscramble (ROT13)

This is a simple email encoding usually used to disguise text that some people may find offensive. The encoding is totally reversible (applying it twice removes the encoding). Only alpha characters are affected (A..Z and a..z).

30 – Hex dump

This changes the text to lines consisting of 16 bytes each. Each line has an 8 hex digit file index, 16 bytes (in hex) and the ASCII representation:

00000000 65 67 69 6E 0D 0A 20 20 20 20 20 20 61 64 64 72 egin........addr

00000010 65 73 73 20 3A 3D 20 0D 0A 20 20 20 20 20 20 20 ess.:=..........

00000020 20 64 65 63 54 6F 48 65 78 53 74 72 28 20 28 66 .decToHexStr(.(f

This filter is very useful for identifying special characters to search and replace.

32 – XXEncode

Essentially identical to UUEncode except that the character set used is different to allow it to pass through EBCDIC gateways without corruption. The XXencoding implemented by TPIPE uses the following characters:

+-0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz

33 – XXDecode

Essentially identical to UUDecode except that the character set used is different to allow it to pass through EBCDIC gateways without corruption.

34 – Reverse line order

The order of the input lines is reversed i.e. the last line comes out first and the first line comes out last. A file is read entirely into RAM before being reversed, so be wary of reversing files that are larger than your machine's RAM size.

35 – Remove email headers

This filter removes the email headers that accompany emails exported to a text format. The email headers are the lines such as To:, From:, Subject: and various other message headers added by all the servers through which your email passes before it gets to its destination.

36 – Decimal dump

This changes the text to lines consisting of 10 bytes each. Each line has a 10 decimal digit file index, 10 bytes (in decimal) and the ASCII representation:

0000000000 080 108 101 097 115 101 032 102 101 101 Please fee

0000000010 108 032 102 114 101 101 032 116 111 032 l free to

0000000020 099 111 109 109 101 110 116 013 010 111 comment..o

This filter is very useful for identifying special characters to search and replace.

37 – HTTP Encode

This filter is used to encode text for use in an HTTP header – a (usually) small piece of text that accompanies a web page request to a web server. This filter is very useful for debugging CGI scripts because it can create HTTP requests in the correct form. HTTP encoded text usually looks like the following:

a+%28usually%29+small+piece+of+text+that+accompanies+a+web+page+request+to+a+web+server.+This+filter+is+very+

38 – HTTP Decode

This filter is used to decode text from an HTTP header – a (usually) small piece of text that accompanies a web page request to a web server.

39 – Randomize lines

This filter put lines into random order. This is useful when a random sample of data is required for statistical purposes - just follow this filter with a head/tail of file filter (/head or /tail). The lines output will differ from one run to the next; the order  is determined by a pseudo-random number generator.

40 – Create word list

This filter takes all the incoming words and outputs them one per line. This can be used to generate word lists for Indexes, encryption programs etc. Hyphenated words are recognized as single words, provided that they aren't broken across lines. To get around this limitation, use a Search and Replace filter to replace hyphens followed by line feeds with just a hyphen. Normally you would follow this filter with a remove duplicates filter, or alternatively, a Count Duplicate Lines filter (with Include counts of 1).

catch22 – a word

24-7 – a word

twenty-four – a word

5th – a word

ice cream – two words

Commas or periods after words are treated as word separators.

41 – Reverse each line

Each line is output reversed from left to right. This can be useful to extract domain names from web site log files - use this filter to reverse each line, use an extract matches filter of [\w\d]+\.[\w\d]+ to extract each domain name, then reverse each line again. Note: This filter will NOT work on Unicode or UTF-8 data. It will only work on single-byte data such as ASCII or ANSI.

42 – Convert to RanDOm case

This filter randomly changes the case of characters. This routine calculates a table of upper and lower case letters on TPIPE startup, and this determination is based on the semantics of the language selected in the Windows Control Panel.

Running this filter again will generate different results; for example:

1. ranDoMIze cASe

2. RanDOmIZE case

3. randOMIZE casE

43 – Extract URLs

Extract URLs. This filter lists http://, https://, ftp:// and gopher:// URLs one per line.

44 – ANSI to Unicode

Converts single byte ANSI characters to double byte Unicode characters. This filter can be useful if you want to send a text file to someone using a language other than your own. This filter is often followed by an Add Header filter, to add a Unicode byte order mark (BOM), \xFF\xFE.

45 – Unicode to ANSI

Converts double byte Unicode characters to single byte ANSI characters. This filter can be useful if you want to send a text file to someone using a language other than your own. This filter is often followed by a Remove start or end of file filter, to either remove the first two bytes of Unicode (before the conversion) or the first byte of ANSI (after the conversion), to remove the leading Unicode byte order mark (BOM).

46 – Display debug window

A debug filter is very handy for debugging filters. When text is passed through this filter, it places the output into a window so that you can see what the text looks like at that stage of the filtering process.

47 – Word concordance

This filter generates a word concordance. A word concordance shows the context or surrounding words for a given set of words in a dictionary.

48 – Remove all

This filter removes all text. Unlike a pattern match filter that matches everything and then throws it away, this filter is far more efficient, especially for large files, as it signals completion back to the input filter so only the first chunk of a multi-gigabyte file will ever get processed.

It is useful in two main situations

1. Inside a subfilter, it prevents any of the subfiltered text from re-entering the text stream. So you could restrict to lines matching a pattern, output the matching lines to a new file, and then remove them.

2. To remove all of the text of a file, then use an Add Header filter with the @fullInputFilename macro to obtain the name of the file.

Note: An Add Left Margin or Add Right Margin filter will not work after a Remove All filter, as they require an actual line to trigger them. Instead, use an Add Header or Add Footer filter.

49 – Restrict to each line in turn

This filter restricts sub filters to operate on each line in turn. This filter is used for its side effect of limiting the matched text to a single line at most.

50 – Convert CSV to Tab-delimited

Converts CSV data (quoted or unquoted) to tab-delimited form. It's preferable to use a file with column headers, because then TPIPE can easily determine if the fields have embedded CR/LFs in them. If the data is properly quoted then TPIPE will determine this automatically.

51 – Convert CSV to XML

Converts CSV data (quoted or unquoted) to XML form. It's preferable to use a file with column headers, because then TPIPE can easily determine if the fields have embedded CR/LFs in them. If the data is properly quoted then TPIPE will determine this automatically. TPIPE correctly escapes < > " ' and & in the data to the corresponding XML entity. If your data contains invalid XML characters such as ASCII 26 (End-of-file, hex \x1A), follow this filter with a search/replace filter to remove \x1A and replace with nothing.

52 – Convert Tab-delimited to CSV

Converts Tab-delimited data to CSV data. It's preferable to use a file with column headers, because then TPIPE can easily determine if the fields have embedded CR/LFs in them. TPIPE cannot determine this without column headers.

53 – Convert Tab-delimited to XML

Converts Tab-delimited data to XML data. It's preferable to use a file with column headers (/simple=55), because then TPIPE can easily determine if the fields have embedded CR/LFs in them. TPIPE cannot determine this without column headers. TPIPE correctly escapes < > " ' and & in the data to the corresponding XML entity. If your data contains invalid XML characters such as ASCII 26 (End-of-file, hex \x1A), follow this filter with a search/replace filter to remove \x1A and replace with nothing.

54 – Convert CSV (with column headers) to XML

See description for 51 – Convert CSV to XML.

55 – Convert Tab-delimited (with column headers) to XML

See description for 53 – Convert Tab-delimited to XML.

56 – Convert CSV (with column headers) to Tab-delimited

See description for 50 – Convert CSV to Tab-delimited.

57 – Convert Tab-delimited (with column headers) to CSV

See description for 52 – Convert Tab-delimited to CSV.

58 – Restrict to file name

This filter applies its subfilters only to files with filenames (ie drive + path + filename) matching or not matching a pattern or list of patterns. This is very handy for only applying a Convert Word Documents to Text filter only to files matching the pattern

\.DOC$

With the appropriate pattern, this filter can also be used to control subfilters based on filename, folder and drive. Note that this filter uses case-insensitive Perl regular expressions, not Windows wildcards.

59 – Convert Word documents to text

This filter takes ALL incoming documents, opens them with Microsoft Word, and outputs them as text files. This can be used to process a set of Word Documents to text file format. After this filter you can add search and replace filters or any other filters you choose.

This filter requires Microsoft Word 98 or higher to be installed. If you wish to convert documents other than the default .DOC files, you may also need to install Word's conversion filters. If Word cannot be started automatically TPIPE will prompt you to start it manually before continuing.

Unless you know that all documents being processed are Word documents (e.g. by using a wildcard of *.doc in the Files to Process tab), you should restrict this filter to only files matching the pattern:

\.DOC$

60 – Swap UTF-16 word order

This filter swaps pairs of bytes

e.g.

Byte number

1

2

3

4

5

6

7

8

Input File

FF

FE

00

20

00

31

00

32

Output File

FE

FF

20

00

31

00

32

00

This is commonly used to transform big-endian or little-endian Unicode files so that other programs can use them.

61 – Swap UTF-32 word order

This filter swaps groups of 2-byte words.

e.g.

Byte number

1

2

3

4

5

6

7

8

Input File

FF

FE

00

00

00

31

00

00

Output File

00

00

FE

FF

00

00

31

00

This is commonly used to transform big-endian or little-endian Unicode files so that other programs can use them.

62 – Remove BOM (Byte Order Mark)

This filter removes the Unicode Byte Order Mark from the start of Unicode files, if it is present.

Bytes removed

Description

00 00 FE FF

UTF-32, big-endian

FF FE 00 00

UTF-32, little-endian

FE FF

UTF-16, big-endian

FF FE

UTF-16, little-endian

EF BB BF

UTF-8

63 – Make Big Endian

Converts a Little Endian Unicode file into a Big Endian Unicode file

e.g.

Input file

Output file

00 00 FE FF 00 00 00 4D

Unchanged

FE FF 4E 8C

Unchanged

FF FE 00 00 4D 00 00 00

00 00 FE FF 00 00 00 4D

FF FE 8C 4E

FE FF 4E 8C

Note - the file MUST start with a Byte Order Mark (BOM) for it to be correctly identified.

64 – Make Little Endian

Converts a Big Endian Unicode file into a Little Endian Unicode file

e.g.

Input file

Output file

00 00 FE FF 00 00 00 4D

FF FF 00 00 4D 00 00 00

FE FF 4E 8C

FF FE 8C 4E

FF FE 00 00 4D 00 00 00

Unchanged

FF FE 8C 4E

Unchanged

Note - the file MUST start with a Byte Order Mark (BOM) for it to be correctly identified.

65 – Compress to Packed Decimal

This filter compresses EBCDIC numeric data (optional leading sign, numbers and periods) to an EBCDIC packed decimal field (also known as Comp-3).

There are several notes to keep in mind when using this filter:

1. You MUST use this filter inside a Restrict to Byte Range filter. The field WIDTH is then set by the containing filter.

2. Compressing a field will decrease your output record length, so ensure you allow for this. A good strategy to avoid problems is to first compress the rightmost field, then work your work back to the leftmost field. This prevents the field column positions from changing and makes the file easier to work with.

This filter will add hex 'B' to negative fields, hex 'C' to positive fields and hex 'F' to unsigned fields. If these codes don't match what your target needs, use a column or CSV restriction to apply a search/replace.

66 – Compress to Zoned Decimal

This filter expands an EBCDIC zoned decimal field to a raw EBCDIC number with a sign. Typically this filter is then followed by a Convert EBCDIC to ASCII filter - after all other fields have been expanded as well.

There are several notes to keep in mind when using this filter:

1. You MUST use this filter inside a Restrict to Byte Range filter. The field WIDTH is then set by the containing filter.

2. Expanding a field will increase your output record length, so ensure you allow for this. A good strategy to avoid problems is to first expand the rightmost field, then work your work back to the leftmost field. This prevents the field column positions from changing and makes the file easier to work with.

67 – Expand Binary Number to EBCDIC

This filter expands a series of digits stored in binary (BIG ENDIAN) form. The maximum width is 8 bytes.

There are several notes to keep in mind when using this filter:

1. You MUST use this filter inside a Restrict to Byte Range filter. The field WIDTH is then set by the containing filter.

2. Expanding a field will increase your output record length, so ensure you allow for this. A good strategy to avoid problems is to first expand the rightmost field, then work your work back to the leftmost field. This prevents the field column positions from changing and makes the file easier to work with.

3. If the data is stored in LITTLE ENDIAN order, use a Reverse filter inside the Restriction prior to the Expand Binary Numbers filter.

68 – Expand Binary Number to ASCII

This filter expands a series of digits stored in binary (BIG ENDIAN) form. The maximum width is 8 bytes.

There are several notes to keep in mind when using this filter:

1. You MUST use this filter inside a Restrict to Byte Range filter. The field WIDTH is then set by the containing filter.

2. Expanding a field will increase your output record length, so ensure you allow for this. A good strategy to avoid problems is to first expand the rightmost field, then work your work back to the leftmost field. This prevents the field column positions from changing and makes the file easier to work with.

3. If the data is stored in LITTLE ENDIAN order, use a Reverse filter inside the Restriction prior to the Expand Binary Numbers filter.

69 – NFC - Canonical Decomposition, followed by Canonical Composition

Applies a Unicode NFC - Canonical Decomposition, followed by Canonical Composition transformation to incoming Unicode text (UTF16-LE). Output is also Unicode UTF16-LE.

70 – NFD - Canonical Decomposition

Applies a Unicode NFD - Canonical Decomposition transformation to incoming Unicode text (UTF16-LE). Output is also Unicode UTF16-LE.

71 – NFKD - Compatibility Decomposition

Applies a Unicode NFKD - Compatibility Decomposition transformation to incoming Unicode text (UTF16-LE). Output is also Unicode UTF16-LE.

72 – NFKC - Compatibility Decomposition, followed by Canonical Composition

Applies a Unicode NFKC - Compatibility Decomposition, followed by Canonical Composition transformation to incoming Unicode text (UTF16-LE). Output is also Unicode UTF16-LE.

73 – Decompose

74 – Compose

Applies a Unicode Compose transformation to incoming Unicode text (UTF16-LE). The output is also Unicode UTF16-LE.

75 – Convert numeric HTML Entities to text

This filter converts decimal/hex numeric HTML/XML entities to plain text. For example:

&#174; → ®

&#xAE; → ®

Typically, the input file is ANSI (single byte) format. This filter will output UTF-8 characters for high-value entities e.g. &#6144; The best approach is to first convert the file from ANSI to UTF-8 (/unicode), then apply this filter.

76 – Convert PDF documents to text

This filter takes ALL incoming documents and converts them from PDF to text. Most of the formatting will be lost.

77 – Restrict to ANSI files

78 – Restrict to Unicode UTF16 files

79 – Restrict to Unicode UTF32 files

80 – Convert Excel spreadsheets to text

This filter takes ALL incoming documents, opens them with Microsoft Excel, and outputs them as CSV (comma-delimited) files (hidden worksheets will be ignored). After running this filter, you can add search and replace filters or any other filters you choose, such as convert the data to Tab-delimited or XML.

This filter requires Microsoft Excel 98 or higher to be installed. If you wish to convert documents other than the default .XLS files, you may also need to install Excel's conversion filters.

Unless you know that all documents being processed are Excel documents (e.g. by using a wildcard of *.xls in the Files to Process tab), you should restrict (/simple=58) this filter to only files matching the pattern

\.XLS$

81 - Shred file

82 - Unicode to escaped ASCII

83 - Restrict to Unicode files

84 - T-filter

The T-Filter allows you to process the same output in multiple ways. You can create a subfilter, and add filters to create the desired output. When this side of the T has finished processing, the data is discarded and the original text continues processing as though the T-filter did not exist.

85 - Convert decimal/hex numeric HTML/XML entities and entity names to text (i.e., &#174 -> ®, or &reg; -> ®). This filter outputs UTF-8 characters for high-value entities.

 

/sort=Type,Reverse,RemoveDuplicates,StartColumn,Length

Sort text files.

Type - the sort type

0 - ANSI sort

1 - ANSI sort (case sensitive)

2 - ASCII sort

3 - ASCII sort (case sensitive)

4 - Numeric sort

5 - Sort by length of line

6 - sort by date and time

7 - sort by date

8 - sort by time

Reverse - If 1, sort in descending order; if 0, sort in ascending order

RemoveDuplicates - If 1, remove duplicate lines; if 0 keep duplicate lines

StartColumn - The column in the line to begin the comparisons

Length - The length of the comparison

 

 /split=type,SplitSize,SplitChar,SplitCharPos,SplitCharCount,SplitLines,SplitFilename[,FirstFileNumber[,PreventOverload]]

 

Adds a split type filter. The arguments are:

 

type:

0 Split at a given size

1 Split at a given character

2 Split at a given number of lines

 

splitSize - the size file to split at

 

splitChar - the character to split at

 

splitCharPos

0 Split before the character (it goes into the next file)

1 Split after the character (it remains in the first file)

2 Split on top of the character (remove it)

 

SplitCharCount - the number of times to see SplitChar before splitting

 

SplitLines - (optional) split after a given number of lines, default 60

 

SplitFilename - (optional) the name to give to each output split file. /split will append a "%3.3d" format specifier to the name; i.e. SplitFilename of "foo.txt" will generate output files named "foo.txt.000", "foo.txt.001", etc. If you don't specify a SplitFilename, /split will use the input filename as the base.

 

FirstFileNumber - the number of the first file. Defaults to 0.

 

PreventOverload - if 1, don't create more than 10,000 files in one folder. Defaults to 0.

 

 /string=type,MatchCase,string

 

Add a string-type filter. The arguments are:

 

type:

0 - Add left margin

1 - Add header

2 - Add footer

3 - Add right margin

4 - Remove lines that match exactly

5 - Retain lines that match exactly

6 - Remove lines matching the Perl pattern

7 - Retain lines matching the Perl pattern

8 - Add text side by side

9 - Add repeating text side by side

10 - Not Used

11 - Not Used

12 - XSLT transform

13 - Restrict to lines from list

14 - Restrict to lines NOT in list

15 - Restrict to lines matching the Perl pattern

16 - Restrict to lines NOT matching the Perl pattern

17 - Restrict to filenames patching the Perl pattern

18 - Restrict to filenames NOT matching the Perl pattern

 

matchCase - case sensitive or not (where appropriate)

 

string - the string to use

 

 /tail=Exclude,LinesOrBytes,Count

 

Add a tail type filter (includes or excludes text at the end of the file). The arguments are:

 

Exclude:

0 - Include the text

1 - Exclude the text

 

LinesOrBytes:

0 - Measure in lines

1 - Measure in bytes

 

Count - the number of lines or bytes to include or exclude

 

/unicode=input,output

 

Convert the file to or from Unicode. input is the encoding for the input file; output is the encoding for the output file. The possible values are:

 

UTF-16LE

UTF-16BE

UTF-32LE

UTF-32BE

UTF-8

ANSI

ASCII

CPnnn, where nnn is a Windows code page (for example, CP437 or CP1251).

 

TPIPE handles files internally as UTF-8, so if you want to process a Windows UTF-16LE file, you'll need to convert it to UTF-8 first, then apply the desired filters, and convert it back to UTF-16LE. For example, to wrap a Unicode file at column 80:

 

tpipe /input=inputname /output=outputname /unicode=UTF-16LE,UTF-8 /number=2,80 /unicode=UTF-8,UTF-16LE

 

 /xml=Type,IncludeText,IncludeQuotes,MatchCase,BufferSize,Tag,Attribute,EndTag

 

Adds an HTML / XML filter. The arguments are:

 

Type - the operation to perform:

0 restrict to an element

1 restrict to an attribute

2 restrict to between tags

 

IncludeText - whether to include the find string in the restriction result (default false)

 

IncludeQuotes - whether to include surrounding quotes in the attribute result or not (default false)

 

MatchCase - match case exactly or not (default false)

 

BufferSize - the maximum expected size of the match (default 32768)

 

Tag - the element or start tag to find

 

Attribute - the attribute to find

 

EndTag - the endTag to find