TPIPE command - Text filtering, search, substitution, and conversion |
Purpose: | Text filtering, search, substitution, conversion, and sorting |
Format: | See Options below. |
Usage:
TPIPE does text filtering, substitution, conversion, and sorting on text files. If you don't specify an input filename, TPIPE will read from standard input if it has been redirected. If you don't specify an output filename, TPIPE will write to standard output. This is substantially slower than reading from and writing to files, but allows you to use TPIPE with pipes.
You can specify multiple filters, which will be processed in the order they appear on the command line. Do not insert any unquoted whitespace or switch characters in the arguments to an option. If you do need to pass whitespace, switch characters, or double quotes in an argument, you can quote the entire option in single back quotes.
Row and column positions start at 1.
TPIPE defaults to UTF8 encoding when loading or saving files.
If you need to process a Windows Unicode UTF-16 file, unless the filter supports Unicode directly (for example, /simple) you'll need to convert it to UTF-8 first (see /unicode=...).
Options:
filename - Filename or folder to read. This can be either a disk file, file list (@filename), or CLIP:. If it is not specified, TPIPE will read from standard input. subfolders - How many subfolders to include (default 0): 0 - no subfolders 1 to 255 - subfolder(s) 255 - all subfolders action - the action to take (default 1): 1 - include the files 2 - exclude the files 3 - ignore the files
You can specify multiple /input statements. |
Filename to write. This can be either a disk file or CLIP:. If it is not specified, TPIPE will write to standard output. |
Set the output filter directory. |
Determines how binary files are processed. The options are: 0 - Binary files are processed (default) 1 - Binary files are skipped 2 - Binary files are confirmed before processing size - The sample size in bytes to use for identifying binary files (default 255) |
In clipboard mode, determines whether the input is ASCII (0) or Unicode (1). The default is 0. |
If 1, the input files will be deleted after processing. USE WITH CAUTION! |
If 1, TPIPE will prompt before processing each input files. |
If 1, TPIPE will prompt before processing read-only input files. |
Process the string as if it were a file and return the result. This option will write the return value to STDOUT; you cannot specify an /output argument. |
If n is 1, append to the output file. |
Sets the output changed mode. The options are: 0 - Always output 1 - Only output modified files 2 - Delete original if modified |
If n=1, TPIPE will open each output file in its associated program upon completion. If there is no association for a file, it will be opened in the default editor. 0 - Output to clipboard (all files are merged) 1 - Output to files 2 - Output to a single merged file |
If 1, TPIPE will open each output file in its associated program upon completion. |
If n is 1, retain the existing file date for the output file. |
Runs the current filter with input from and output to the clipboard. |
Name of filter file to load (see /save=filename) |
Saves the filter settings defined on the command line to the specified filename, and returns without executing any filters. |
The following filters are created as sub filters, until the closing /ENDSUBFILTERS. Sub filters allow a restricted part of the entire text to be operated on by a group of filters without effecting the entire text. For example, a "Restrict to delimited fields" (CSV, Tab, Pipe, etc.) filter can pick out a range of CSV fields, and then a search/replace filter can operate JUST on the text restricted.
|
End the sub filters defined by the preceding /STARTSUBFILTERS. |
Sets the buffer size for the preceding search/replace filter. (The default is 4096.) |
Sets the edit distance threshhold for the preceding search/replace filter. (The default is 2.) |
Add a comment to a filter file.
Text - Comment to add |
Adds a database-type filter. Database filters will change the output extension to match the format.
Mode 0 Delimited output 1 Fixed width 2 XML 3 Insert script 4 JSON output
GenerateHeader - Generates header information when True.
Timeout - SQL command timeout in seconds.
ConnectionStr - The database connection string.
InsertTable - The name of the insert table.
FieldDelimiter - The string to use between columns.
Qualifier - The string to use around string column values. |
Remove or show duplicate lines. The arguments are:
Type: 0 - Remove duplicate lines 1 - Show duplicate lines
MatchCase - If 1, do case-sensitive comparisons.
StartColumn - The starting column for comparisons (the first column is 1).
Length - The Length of the comparison.
IncludeOne - If 1, include lines with a count of 1 (only for Type 1).
Format - how the output should be formatted for Type=1. Format strings are composed of plain text and format specifiers. Plain text characters are copied as-is to the resulting string. Format specifiers have the following form: %[index ":"][-][width][.precision]type An optional argument index specifier An optional left justification indicator, ["-"] An optional width specifier, [width] (an integer). If the width of the number is less than the width specifier, it will be padded with spaces. An optional precision specifier [precision] (an integer). If the width of the number is less than the precision, it will be left padded with 0's. The conversion type character: d - decimal s - string Percent signs in the format string should be doubled (unless you back quote the /dup=`...` argument), and the count argument must appear before the string (unless you use the index specifier). For example, "%%d %%s" shows the count followed by the string. Or, with the index specifier, "string %%1:s count %%0:d". |
Add an EOL (end of line) conversion filter. The arguments are:
input: 0 - Unix (LF) 1 - Mac (CR) 2 - Windows (CR/LF) 3 - Auto If you are unsure of the source, select Auto. The Auto option can detect and modify text files containing a variety of line endings. 4 - Fixed (use the length parameter to specify the length) If you are converting a mainframe file that contains fixed length records, select "Fixed length" and enter the record length. The maximum record length is 2,147,483,647 characters. Note: If you are converting 132 column mainframe reports, you should set the fixed length to 133, because each line has a prefix character.
output: 0 - Unix 1 - Mac 2 - Windows 3 - None
length - The line length to use if input=4
LFString (optional) - The new line feed string on output when option 4 is chosen for input
Remove (optional) - Whether to remove bad EOLs (default 1) |
Add a file-type filter. The arguments are:
type: 0 - Add left margin 1 - Add header 2 - Add footer 3 - Add right margin 4 - Remove lines that match exactly 5 - Retain lines that match exactly 6 - Remove lines matching the Perl pattern 7 - Retain lines matching the Perl pattern 8 - Add text side by side 9 - Add repeating text side by side 10 - Not Used 11 - Not Used 12 - XSLT transform 13 - Restrict to lines from list 14 - Restrict to lines NOT in list 15 - Restrict to lines matching the Perl pattern 16 - Restrict to lines NOT matching the Perl pattern 17 - Restrict to filenames patching the Perl pattern 18 - Restrict to filenames NOT matching the Perl pattern
MatchCase - If 1, do a case sensitive match (where appropriate)
filename - the filename to use |
Adds a Grep type line based filter. The arguments are:
Type: 0 Restrict lines matching (for subfilters) 1 Restrict lines NOT matching (for subfilters) 2 Extract matches 3 Extract matching lines (grep) 4 Extract non-matching lines (inverse grep) 5 Remove matching lines 6 Remove non-matching lines
IncludeLineNumbers - 1 to include the line number where the pattern was found
IncludeFilename - 1 to include the filename where the pattern was found
MatchCase - 1 to do a case-sensitive comparison when matching the pattern
CountMatches - 1 to output a count of the number of matches
PatternType 0 Perl pattern 1 Egrep pattern 2 Brief pattern 3 MS Word pattern
UTF8 - 1 to allow matching Unicode UTF8 characters
IgnoreEmpty - 1 to ignore empty matches
Pattern - the (regular expression) pattern to match |
Add a head type filter (includes or excludes text at the beginning of the file). The arguments are:
Exclude: 0 - Include the text 1 - Exclude the text
LinesOrBytes: 0 - Measure in lines 1 - Measure in bytes
Count - the number of lines or bytes to include or exclude |
Add an insert type filter. The arguments are:
type: 0 - Insert column Inserts a new column of text. The position the text is inserted is determined by a column count. The leftmost column is column 1 – inserting in this column displaces all other text to the right. If the insert column given is 0, the text is inserted at the end of the line. If the insert column is negative, the text is inserted at the given position relative to the end of the line. If the insert column given is before the start of the line, or beyond the end of the line, then the text is prepended or appended to the line respectively. Note - this filter is designed for ANSI or Unicode UTF-8 data - it will not handle UTF-16 data. If you need to process UTF-16 files, convert them to UTF-8 first and then convert back to UTF-8 after doing the insertion. 1 - Insert bytes Insert bytes at the given offset (from 0 to the size of the file).
position - the position to insert the string
string - the string to insert |
Adds a Line Number filter. The arguments are:
StartNumber - the starting line number
Increment - the amount to add for each new line number
SkipBlankIncrement - don't increase the line number for blank lines
DontNumberBlank - don't put a line number on blank lines
NumberFormat - The format to use for the line number. The format syntax is: [-][width][.precision]d An optional left justification indicator, ["-"] An optional width specifier, [width] (an integer). If the width of the number is less than the width specifier, it will be padded with spaces. An optional precision specifier [precision] (an integer). If the width of the number is less than the precision, it will be left padded with 0's. The conversion type character: d - decimal
DontReset - if 1, do not reset the line count at the end of the file. The default is 0.
ResetNewFile - if 1, reset the count at the start of a new file. The default is 0. |
Log the TPIPE actions.
Filename - Name of log file |
If n is 1, append to the log file. |
Adds a maths type filter.
operation - the operation to perform 0 + 1 - 2 * 3 div (the remainder is ignored) 4 mod (the remainder after division) 5 xor 6 and 7 or 8 not 9 shift left (0 inserted) 10 shift right (0 inserted) 11 rotate left 12 rotate right
operand - the operand to use |
Adds a merge type filter (merge into single output filename). The arguments are:
type: 0 Merge into filename 1 Retain lines found in filename 2 Remove lines found in filename 3 Link filter filename filename - the filename to use |
Add a number-type filter. The arguments are:
type: 0 Convert Tabs to Spaces 1 Convert Spaces to Tabs 2 Word wrap (value column width) 3 Pad to width of value 4 Center in width of value 5 Right justify in width of value 6 Restrict CSV field to value 7 Restrict tab-delimited field to value 8 Truncate to width value 9 Force to width value 10 Repeat file value times 11 Restrict to blocks of length 12 Expand packed decimal (with implied decimals) 13 Expand zoned decimal (with implied decimals) 14 Expand unsigned (even-length) packed decimal 15 Expand unsigned (odd-length) packed decimal
Value - the numeric value to use |
Sets the Perl matching options for the immediately preceding search/replace filter.
BufferSize - The maximum buffer size to use for matches. Any match must fit into this buffer, so if you want to match larger pieces of text, increase the size of this buffer to suit. Default is 4096.
Greedy - If the pattern finds the longest match (greedy) or the shortest match. Default is false.
AllowComments - Allow comments in the Perl pattern. Default is false.
DotMatchesNewLines - Allow the '.' operator to match all characters, including new lines. Default is true. |
Adds a search and replace (find and replace) filter. Search / Replace lists discard blank search terms and terms where the replacement is identical to the search. Search / Replace lists can generate log entries (useful for debugging). Logs can optionally be output only for where replacements occurred.
The arguments are:
Type: 0 Replace 1 Pattern (old style) 2 Sounds like 3 Edit distance 4 Perl pattern 5 Brief pattern 6 Word pattern
MatchCase - Matches case when set to 1, ignores case when set to 0
WholeWord - Matches whole words only when set to 1
CaseReplace - Replaces with matching case when set to 1
PromptOnReplace - Prompts before replacing when set to 1
Extract - If 1, all non-matching text is discarded
FirstOnly - If 1, only replace the first occurrence
SkipPromptIdentical - If 1, don't bother prompting if the replacement text is identical to the original.
Action - the action to perform when found: 0 replace 1 remove 2 send to subfilter 3 send non-matching to subfilter 4 send subpattern 1 to subfilter etc
SearchStr - the string to search for
ReplaceStr - the string to replace it with |
Add a search and replace list, using search and replace pairs from the specified file.
Type: 0 Replace 1 Pattern (old style) 2 Sounds like 3 Edit distance 4 Perl pattern 5 Brief pattern 6 Word pattern
MatchCase - Matches case when set to 1, ignores case when set to 0
WholeWord - Matches whole words only when set to 1
CaseReplace - Replaces with matching case when set to 1
PromptOnReplace - Prompts before replacing when set to 1
FirstOnly - If 1, only replace the first occurrence
SkipPromptIdentical - If 1, don't bother prompting if the replacement text is identical to the original.
Simultaneous - If 1, all search strings are scanned for simultaneously instead of consecutively. (This is useful if the search strings and results strings overlap.)
LongestFirst - If 1, searches for long phrases (most specific) before short phrases (least specific) - this is generally used for translations.
Filename - The file to load search/replace pairs from. If the file extension is .XLS or .XLSX, the file is assumed to be Excel format, if the extension is .TAB the file is assumed to have tab-delimited values, and any other extension (including .CSV) is assumed to have Comma-Separated Values. The filename can contain environment variables enclosed in % signs e.g. %TEMP%\myfile.txt. TPIPE corrects any doubled backslashes. |
Adds a Run External Program filter. The arguments are:
InputFilename - the filename that TextPipe should read from after the External Program writes to it. OutputFilename - the filename that TextPipe should write to for the External Program to read in. CommandLine - the command line of the program to run. Should include double quotes around the entire command line. |
Adds an ActiveX script filter.
language: The language of the script
timeout: The command timeout in seconds
script: The code |
Type - The type of filter to add: 0 – Remove column: This filter is used to remove columns of text, given a column specification that describes the position of the column relative to the start or end of the line, and the width of the column. There are several ways to specify the columns (Locate,Param1,Param2) to remove: 0 - Start column, End column. This removes all text including and between the specified columns. Useful for removing column in fixed width data files. 1 - Start column, width. Removes Width characters starting from (and including) column Start. 2 - End column, width. Removes Width characters backwards starting from (and including) column End. 3 - Start column to end of line. Removes all characters from the Start column to the very end of the line. Useful for making a file a uniform width. 4 - Width to end of line. Removes Width characters backwards starting from (and including) the last column. Note - if you are removing more than one column range, it is easiest to remove ranges from right-to-left so that the position of the columns doesn't change. 1 - Restrict lines (restriction filters require sub filters to have any effect) 2 - Restrict columns (restriction filters require sub filters to have any effect) 3 - Restrict to bytes (restriction filters require sub filters to have any effect) 4 - Restrict to delimited fields (CSV, Tab, Pipe, etc.) 6 – Remove lines: This filter removes a range of lines. There are several ways to specify the lines (Locate,Param1,Param2) to remove: 0 - Start line, End line. This removes all lines including and between the specified lines. 1 - Start line, width. Removes Width lines starting from (and including) line Start. 2 - End line, width. Removes Width lines backwards starting from (and including) line End. 3 - Start line to end of line. Removes all lines from the Start line to the very end of the line. 4 - Width to end of line. Removes Width lines backwards starting from (and including) the last line. 7 – Remove delimited fields (CSV, Tab, Pipe, etc.): This filter is used to remove fields delimited by a given character. You can choose a predefined delimiter character (Delimiter), or select your own (CustomDelimiter). The trailing delimiter (if any) is also removed. When Comma (.csv) is chosen, TPIPE automatically handles single and double quoted strings, with embedded line feeds. First Row Contains Field Names If the first line of the file contains Field Names, set HasHeader to 1 so that TPIPE can count how many fields are expected. It can then determine if a field has embedded CR/LF characters and spans multiple lines. TPIPE can also determine this without a header if the fields are properly double-quoted - TPIPE will notice the missing double quote and continue reading the record from the following line. Remove Fields There are several ways to specify the fields (Locate,Param1,Param2) to remove: 0 - Start field, end field. This removes all text including and between the specified fields. 1 - Start field, width. Removes Width fields starting from (and including) field Start. 2 - End field, width. Removes Width fields backwards starting from (and including) field End. 3 - Start field to end of line. Removes all fields from the Start field to the very end of the line. 4 - Width to end of line. Removes Width fields backwards starting from (and including) the last field. Note - if you are removing more than one field range, it is easiest to remove ranges from right-to-left so that the position of the fields doesn't change. 9 – Move columns: TPIPE will move columns to a new position on the line. The new position (MoveTo) is specified assuming that the moved columns have been removed from the line. 10 – Move delimited fields (CSV, Tab, Pipe, etc.): TPIPE will move CSV-delimited fields to a new position on the line. The new position (MoveTo) is specified assuming that the moved fields have been removed from the line. TPIPE ensures that all the delimiters on the line are correctly maintained, both at the end of the line and where the moved fields are inserted. Note - this filter is designed for ANSI or Unicode UTF-8 data - it will not handle UTF-16 data. You will need to convert UTF-16 files to UTF-8 first, do the selection, and then convert back to UTF-16. 12 – Copy columns: TPIPE will copy columns to a new position (MoveTo) on the line. Note - this filter is designed for ANSI or Unicode UTF-8 data - it will not handle UTF-16 data. You will need to convert UTF-16 files to UTF-8 first, do the selection, and then convert back to UTF-16. 13 – Copy delimited fields (CSV, Tab, Pipe, etc.): TPIPE will copy CSV-delimited fields to a new position (MoveTo) on the line. TPIPE ensures that all the delimiters on the line are correctly maintained, both at the end of the line and where the copied fields are inserted. Note - this filter is designed for ANSI or Unicode UTF-8 data - it will not handle UTF-16 data. You will need to convert UTF-16 files to UTF-8 first, do the selection, and then convert back to UTF-16. 17 – Remove byte range: This filter is used to remove a range of bytes. There are several different ways to specify the bytes (Locate,Param1,Param2) to remove: 0 - Start byte, end byte. This removes all text including and between the specified byte. 1 - Start byte, width. Removes Width byte starting from (and including) the start byte. 2 - End byte, width. Removes Width fields backwards starting from (and including) byte End. 3 - Start byte to end of file. Removes all fields from the Start byte to the very end of the file. 4 - Width to end of file. Removes Width fields backwards starting from (and including) the last byte. Note - if you are removing more than one byte range, it is easiest to remove ranges from right-to-left so that the position of the bytes doesn't change.
Locate - How to determine which areas to affect: 0 - Restrict %d .. %d 1 - Restrict %1:d starting at %0:d 2 - Restrict %1:d starting at END - %0:d 3 - Restrict %d .. END - %d 4 - Restrict END - %d .. END - %d
Param1, Param2 - The integer values for the Locate method.
MoveTo - The integer value where to move or copy the columns or fields to (first columns or field is 1).
Delimiter - The index of the standard delimiter to use: 0 - Comma 1 - Tab 2 - Semicolon 3 - Pipe (|) 4 - Space 5 - Custom
CustomDelimiter - The custom delimiter to use (if Delimiter == 5). This should be a quoted string; if you are not using a custom delimiter then set this field to "".
HasHeader - 1 if the file's first row is a header row, 0 if not.
ProcessIndividually - Whether to apply sub filters to each CSV or Tab field individually (1), or to the fields as one string value (0). The default is 0.
ExcludeDelimiter - Whether or not to include the comma or Tab field delimiter when passing the field to the sub filter. Defaults to 0.
ExcludeQuotes - Whether or not to include the CSV quotes that may surround the field when passing the field to the subfilter. Defaults to 1. |
Type - the type of filter to add 0 - Delete column 1 - Restrict lines 2 - Restrict columns 3 - Restrict to bytes 4 - Restrict to delimited fields (CSV, Tab, Pipe etc) 5 - unused 6 - Remove lines 7 - Remove delimited fields (CSV, Tab, Pipe etc) 9 - Move columns 10 - Move delimited fields (CSV, Tab, Pipe etc) 12 - Copy columns 13 - Copy delimited fields (CSV, Tab, Pipe etc) 17 - Remove Byte Range 18 - Extract fields columnSpec - the double-quoted list of items to remove e.g. "1..10, 16, 20" moveTo - (integer) where to move or copy the columns or fields to. processIndividually - whether or not to apply sub filters to each CSV or Tab field individually, or to the fields as one string value. excludeDelimiter - whether or not to include the comma or Tab field delimiter when passing the field to the sub filter. excludeQuotes - whether or not to include the CSV quotes that may surround the field when passing the field to the sub filter. delimiter - (optional) the index of the standard delimiter to use, default 0 for CSV 0 - Comma 1 - Tab 2 - Semicolon 3 - Pipe (|) 4 - Space 5 - Custom customDelimiter - (optional) the double quoted custom delimiter to use; the default is blank. hasHeader - (optional) 1 if the file's first row is a header row, default 0. |
Adds a simple filter type. n is the type of filter to add, and for those filters that support it, u indicates that the filter will be dealing with Unicode data.
1 – Convert ASCII to EBCDIC EBCDIC is the character collating sequence commonly used on mainframes. Some characters cannot be converted because they exist in one character set but not the other. 2 – Convert EBCDIC to ASCII 3 – Convert ANSI to OEM Converts from ANSI to ASCII/OEM. ANSI is an 8-bit character set used by Windows, and it includes all accentuated Roman characters used by non-English languages like French, German and Spanish. (Windows uses UTF-16LE for all of its internal APIs, and converts to ANSI if the user is using raster fonts or ANSI files.) ASCII/OEM is an extension of the original IBM character set where various non-essential characters are replaced by language-specific accentuated characters. Different ASCII/OEM character sets are not compatible. They must be converted to ANSI and then back to the correct ASCII/OEM character set to be readable. 4 – Convert OEM to ANSI 5 – Convert to UPPERCASE Forces all text to UPPERCASE. To make the conversion, the function uses the current language selected by the user in the system Control Panel. If no language has been selected, TPIPE uses the Windows internal default mapping. 6 – Convert to lowercase Forces all text to lowercase. To make the conversion, the function uses the current language selected by the user in the system Control Panel. If no language has been selected, TPIPE uses the Windows internal default mapping. 7 – Convert to Title Case Converts all text to Title Case -- i.e., the first letter of every word is capitalized, and all other letters are forced to lower case. This routine calculates a table of upper and lower case letters on TPIPE startup, and this determination is based on the semantics of the language selected in Control Panel. 8 – Convert to Sentence Case Converts all text to Sentence case ie the first word in every sentence is capitalized, all other letters are left as is. Sentences start after periods, exclamation marks, colons, question marks, quotes, parentheses and angle brackets (.!:?'"<(). 9 – Convert to tOGGLE cASE tOGGLES tHE cASE of all text -- i.ee, all UPPERCASE characters are converted to lowercase and vice-versa. 10 – Remove blank lines Removes blank lines. Note, lines with spaces or tabs are not removed. Use the Remove Blanks From Start Of Line filters first to rectify this. 11 – Remove blanks from End of Line Removes spaces and tabs from the end of every line. 12 – Remove blanks from Start of Line Removes spaces and tabs from the start of every line. 13 – Remove binary characters Removes binary characters such as those higher than ASCII code 127, and those less than ASCII code 32 except for carriage returns (ASCII code 13) and line feeds (ASCII code 10). This filter if very useful if you have a corrupted text file, or if you just want to see what text is inside a binary file. The binary information is removed, leaving you with just the text. 14 – Remove ANSI codes ANSI (American National Standards Institute) codes are included in various streams of information, to provide a remote computer with control over cursor positioning, text attributes, etc. They are also used in connections between minicomputers and mainframe computers and the terminals connected to them. The need to use an ANSI filter can be recognized when something like the following example shows up in a file viewed in a text editor: <[0;1;4mas<[m - MC88000 assembler In this example, the "as" near the beginning is displayed in a different color than the rest of the line when the ANSI codes are properly processed. The Escape (ASCII 27) codes above have been replaced by the < symbol to make this line printable. The Remove ANSI Escape Sequences filter can be used to filter out these codes and "clean up" the text so that it can be used in standard fashions such as copying and pasting into a word processor. On Unix machines the man (manual) help utility will only allow page-by-page browsing through a file in a forward direction. By piping the man output to a text file, transferring it to a DOS machine, and running it through the Remove ANSI Escape Sequences filter (and the Convert EOL filter - Unix to DOS if desired), a standard DOS editor can be used for browsing through the file, quoting from it, etc. 15 – Convert IBM drawing characters IBM drawing characters in the upper ASCII range (128-255) are commonly used to draw lines and boxes, single and double line borders, shaded characters etc. Many devices (such as printers, non-IBM computers etc.) do not support the display of these characters. This filter converts them to standard ASCII characters (+, - and |) that all computers can display. 16 – Remove HTML and SGML Use this filter to convert HTML documents to a readable format. This filter removes HTML and XML markup tags i.e. everything including and between <> brackets. 17 – Remove backspaces Remove backspaces, i.e. all ASCII code 8's. 18 – Resolve backspaces Resolve backspaces -- i.e., remove both the backspaces and the characters prior to the backspaces that would have been deleted. 19 – Remove multiple whitespace Removes sequences of multiple spaces or tabs and replaces them with a single space. 20 – UUEncode Usually used for transmitting binary files inside an email. Files of this type are usually given an extension of .uue. Warning – UUencoded text may be corrupted when passing over a mainframe mail gateway. To avoid corruption, use Mime Base 64 or XXEncode. 21 – Hex Encode A very simple encoding of a file. Usually used for small files, because it uses a large amount of space. The benefit is that the file is very easy to encode/decode, and the file cannot be corrupted passing through mail gateways. 22 – Hex Decode Converts a file from its hex representation back to binary. The file to be decoded MUST NOT have any extra characters at the start or end if it is to be successfully processed. 23 – MIME Encode (Base 64) Used for binary data. Files of this type are usually given an extension of .b64. 24 – MIME Decode (Base 64) Used for binary data. Files of this type are usually given an extension of .b64. The file to be decoded MUST NOT have any extra characters at the start or end if it is to be successfully processed. 25 – MIME Encode (Quoted printable) Quoted printable is used for text that is mainly readable, but may contain special characters with accents etc. 26 – MIME Decode (Quoted printable) The inverse of the above encoding. 27 – UUDecode Mail attachments can be uuencoded, use this filter to convert the file back to its correct form. Files of this type are usually given an extension of .uue. 28 – Extract email addresses Extract email addresses. This filter searches for email addresses of the form [email protected], and writes them out one per line (using a DOS line feed, CR/LF). Usually this filter is followed by a filter to remove duplicate lines, and then by a Search and Replace filter, searching for \013\010 and replacing with a comma or semi-colon, depending on the email address separator used by your email software. 29 – Unscramble (ROT13) This is a simple email encoding usually used to disguise text that some people may find offensive. The encoding is totally reversible (applying it twice removes the encoding). Only alpha characters are affected (A..Z and a..z). 30 – Hex dump This changes the text to lines consisting of 16 bytes each. Each line has an 8 hex digit file index, 16 bytes (in hex) and the ASCII representation: 00000000 65 67 69 6E 0D 0A 20 20 20 20 20 20 61 64 64 72 egin........addr 00000010 65 73 73 20 3A 3D 20 0D 0A 20 20 20 20 20 20 20 ess.:=.......... 00000020 20 64 65 63 54 6F 48 65 78 53 74 72 28 20 28 66 .decToHexStr(.(f This filter is very useful for identifying special characters to search and replace. 32 – XXEncode Essentially identical to UUEncode except that the character set used is different to allow it to pass through EBCDIC gateways without corruption. The XXencoding implemented by TPIPE uses the following characters: +-0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz 33 – XXDecode Essentially identical to UUDecode except that the character set used is different to allow it to pass through EBCDIC gateways without corruption. 34 – Reverse line order The order of the input lines is reversed i.e. the last line comes out first and the first line comes out last. A file is read entirely into RAM before being reversed, so be wary of reversing files that are larger than your machine's RAM size. 35 – Remove email headers This filter removes the email headers that accompany emails exported to a text format. The email headers are the lines such as To:, From:, Subject: and various other message headers added by all the servers through which your email passes before it gets to its destination. 36 – Decimal dump This changes the text to lines consisting of 10 bytes each. Each line has a 10 decimal digit file index, 10 bytes (in decimal) and the ASCII representation: 0000000000 080 108 101 097 115 101 032 102 101 101 Please fee 0000000010 108 032 102 114 101 101 032 116 111 032 l free to 0000000020 099 111 109 109 101 110 116 013 010 111 comment..o This filter is very useful for identifying special characters to search and replace. 37 – HTTP Encode This filter is used to encode text for use in an HTTP header – a (usually) small piece of text that accompanies a web page request to a web server. This filter is very useful for debugging CGI scripts because it can create HTTP requests in the correct form. HTTP encoded text usually looks like the following: a+%28usually%29+small+piece+of+text+that+accompanies+a+web+page+request+to+a+web+server.+This+filter+is+very+ 38 – HTTP Decode This filter is used to decode text from an HTTP header – a (usually) small piece of text that accompanies a web page request to a web server. 39 – Randomize lines This filter put lines into random order. This is useful when a random sample of data is required for statistical purposes - just follow this filter with a head/tail of file filter (/head or /tail). The lines output will differ from one run to the next; the order is determined by a pseudo-random number generator. 40 – Create word list This filter takes all the incoming words and outputs them one per line. This can be used to generate word lists for Indexes, encryption programs etc. Hyphenated words are recognized as single words, provided that they aren't broken across lines. To get around this limitation, use a Search and Replace filter to replace hyphens followed by line feeds with just a hyphen. Normally you would follow this filter with a remove duplicates filter, or alternatively, a Count Duplicate Lines filter (with Include counts of 1). catch22 – a word 24-7 – a word twenty-four – a word 5th – a word ice cream – two words Commas or periods after words are treated as word separators. 41 – Reverse each line Each line is output reversed from left to right. This can be useful to extract domain names from web site log files - use this filter to reverse each line, use an extract matches filter of [\w\d]+\.[\w\d]+ to extract each domain name, then reverse each line again. Note: This filter will NOT work on Unicode or UTF-8 data. It will only work on single-byte data such as ASCII or ANSI. 42 – Convert to RanDOm case This filter randomly changes the case of characters. This routine calculates a table of upper and lower case letters on TPIPE startup, and this determination is based on the semantics of the language selected in the Windows Control Panel. Running this filter again will generate different results; for example: 1. ranDoMIze cASe 2. RanDOmIZE case 3. randOMIZE casE 43 – Extract URLs Extract URLs. This filter lists mailto:, http://, https://, ftp://, ftps://, nntp:, skype:, call:, and gopher:// URLs one per line. 44 – ANSI to Unicode Converts single byte ANSI characters to double byte Unicode characters. This filter can be useful if you want to send a text file to someone using a language other than your own. This filter is often followed by an Add Header filter, to add a Unicode byte order mark (BOM), \xFF\xFE. 45 – Unicode to ANSI Converts double byte Unicode characters to single byte ANSI characters. This filter can be useful if you want to send a text file to someone using a language other than your own. This filter is often followed by a Remove start or end of file filter, to either remove the first two bytes of Unicode (before the conversion) or the first byte of ANSI (after the conversion), to remove the leading Unicode byte order mark (BOM). 46 – Display debug window A debug filter is very handy for debugging filters. When text is passed through this filter, it places the output into a window so that you can see what the text looks like at that stage of the filtering process. 47 – Word concordance This filter generates a word concordance. A word concordance shows the context or surrounding words for a given set of words in a dictionary. 48 – Remove all This filter removes all text. Unlike a pattern match filter that matches everything and then throws it away, this filter is far more efficient, especially for large files, as it signals completion back to the input filter so only the first chunk of a multi-gigabyte file will ever get processed. It is useful in two main situations 1. Inside a subfilter, it prevents any of the subfiltered text from re-entering the text stream. So you could restrict to lines matching a pattern, output the matching lines to a new file, and then remove them. 2. To remove all of the text of a file, then use an Add Header filter with the @fullInputFilename macro to obtain the name of the file. Note: An Add Left Margin or Add Right Margin filter will not work after a Remove All filter, as they require an actual line to trigger them. Instead, use an Add Header or Add Footer filter. 49 – Restrict to each line in turn This filter restricts sub filters to operate on each line in turn. This filter is used for its side effect of limiting the matched text to a single line at most. 50 – Convert CSV to Tab-delimited Converts CSV data (quoted or unquoted) to tab-delimited form. It's preferable to use a file with column headers, because then TPIPE can easily determine if the fields have embedded CR/LFs in them. If the data is properly quoted then TPIPE will determine this automatically. TPIPE will eliminate unnecessary quotes. 51 – Convert CSV to XML Converts CSV data (quoted or unquoted) to XML form. It's preferable to use a file with column headers, because then TPIPE can easily determine if the fields have embedded CR/LFs in them. If the data is properly quoted then TPIPE will determine this automatically. TPIPE correctly escapes < > " ' and & in the data to the corresponding XML entity. If your data contains invalid XML characters such as ASCII 26 (End-of-file, hex \x1A), follow this filter with a search/replace filter to remove \x1A and replace with nothing. 52 – Convert Tab-delimited to CSV Converts Tab-delimited data to CSV data. It's preferable to use a file with column headers, because then TPIPE can easily determine if the fields have embedded CR/LFs in them. TPIPE cannot determine this without column headers. 53 – Convert Tab-delimited to XML Converts Tab-delimited data to XML data. It's preferable to use a file with column headers (/simple=55), because then TPIPE can easily determine if the fields have embedded CR/LFs in them. TPIPE cannot determine this without column headers. TPIPE correctly escapes < > " ' and & in the data to the corresponding XML entity. If your data contains invalid XML characters such as ASCII 26 (End-of-file, hex \x1A), follow this filter with a search/replace filter to remove \x1A and replace with nothing. 54 – Convert CSV (with column headers) to XML See description for 51 – Convert CSV to XML. 55 – Convert Tab-delimited (with column headers) to XML See description for 53 – Convert Tab-delimited to XML. 56 – Convert CSV (with column headers) to Tab-delimited See description for 50 – Convert CSV to Tab-delimited. 57 – Convert Tab-delimited (with column headers) to CSV See description for 52 – Convert Tab-delimited to CSV. 58 – Restrict to file name This filter applies its subfilters only to files with filenames (ie drive + path + filename) matching or not matching a pattern or list of patterns. This is very handy for only applying a Convert Word Documents to Text filter only to files matching the pattern \.DOC$ With the appropriate pattern, this filter can also be used to control subfilters based on filename, folder and drive. Note that this filter uses case-insensitive Perl regular expressions, not Windows wildcards. 59 – Convert Word documents to (UTF8) text This filter takes ALL incoming documents, opens them with Microsoft Word, and outputs them as text files. This can be used to process a set of Word Documents to text file format. After this filter you can add search and replace filters or any other filters you choose. This filter requires Microsoft Word 98 or higher to be installed. If you wish to convert documents other than the default .DOC files, you may also need to install Word's conversion filters. If Word cannot be started automatically TPIPE will prompt you to start it manually before continuing. Unless you know that all documents being processed are Word documents (e.g. by using a wildcard of *.doc in the Files to Process tab), you should restrict this filter to only files matching the pattern: \.DOC$ 60 – Swap UTF-16 word order This filter swaps pairs of bytes e.g.
This is commonly used to transform big-endian or little-endian Unicode files so that other programs can use them. 61 – Swap UTF-32 word order This filter swaps groups of 2-byte words. e.g.
This is commonly used to transform big-endian or little-endian Unicode files so that other programs can use them. 62 – Remove BOM (Byte Order Mark) This filter removes the Unicode Byte Order Mark from the start of Unicode files, if it is present.
63 – Make Big Endian Converts a Little Endian Unicode file into a Big Endian Unicode file e.g.
Note - the file MUST start with a Byte Order Mark (BOM) for it to be correctly identified. 64 – Make Little Endian Converts a Big Endian Unicode file into a Little Endian Unicode file e.g.
Note - the file MUST start with a Byte Order Mark (BOM) for it to be correctly identified. 65 – Compress to Packed Decimal This filter compresses EBCDIC numeric data (optional leading sign, numbers and periods) to an EBCDIC packed decimal field (also known as Comp-3). There are several notes to keep in mind when using this filter: 1. You MUST use this filter inside a Restrict to Byte Range filter. The field WIDTH is then set by the containing filter. 2. Compressing a field will decrease your output record length, so ensure you allow for this. A good strategy to avoid problems is to first compress the rightmost field, then work your work back to the leftmost field. This prevents the field column positions from changing and makes the file easier to work with. This filter will add hex 'B' to negative fields, hex 'C' to positive fields and hex 'F' to unsigned fields. If these codes don't match what your target needs, use a column or CSV restriction to apply a search/replace. 66 – Compress to Zoned Decimal This filter expands an EBCDIC zoned decimal field to a raw EBCDIC number with a sign. Typically this filter is then followed by a Convert EBCDIC to ASCII filter - after all other fields have been expanded as well. There are several notes to keep in mind when using this filter: 1. You MUST use this filter inside a Restrict to Byte Range filter. The field WIDTH is then set by the containing filter. 2. Expanding a field will increase your output record length, so ensure you allow for this. A good strategy to avoid problems is to first expand the rightmost field, then work your work back to the leftmost field. This prevents the field column positions from changing and makes the file easier to work with. 67 – Expand Binary Number to EBCDIC This filter expands a series of digits stored in binary (BIG ENDIAN) form. The maximum width is 8 bytes. There are several notes to keep in mind when using this filter: 1. You MUST use this filter inside a Restrict to Byte Range filter. The field WIDTH is then set by the containing filter. 2. Expanding a field will increase your output record length, so ensure you allow for this. A good strategy to avoid problems is to first expand the rightmost field, then work your work back to the leftmost field. This prevents the field column positions from changing and makes the file easier to work with. 3. If the data is stored in LITTLE ENDIAN order, use a Reverse filter inside the Restriction prior to the Expand Binary Numbers filter. 68 – Expand Binary Number to ASCII This filter expands a series of digits stored in binary (BIG ENDIAN) form. The maximum width is 8 bytes. There are several notes to keep in mind when using this filter: 1. You MUST use this filter inside a Restrict to Byte Range filter. The field WIDTH is then set by the containing filter. 2. Expanding a field will increase your output record length, so ensure you allow for this. A good strategy to avoid problems is to first expand the rightmost field, then work your work back to the leftmost field. This prevents the field column positions from changing and makes the file easier to work with. 3. If the data is stored in LITTLE ENDIAN order, use a Reverse filter inside the Restriction prior to the Expand Binary Numbers filter. 69 – NFC - Canonical Decomposition, followed by Canonical Composition Applies a Unicode NFC - Canonical Decomposition, followed by Canonical Composition transformation to incoming Unicode text (UTF16-LE). Output is also Unicode UTF16-LE. 70 – NFD - Canonical Decomposition Applies a Unicode NFD - Canonical Decomposition transformation to incoming Unicode text (UTF16-LE). Output is also Unicode UTF16-LE. 71 – NFKD - Compatibility Decomposition Applies a Unicode NFKD - Compatibility Decomposition transformation to incoming Unicode text (UTF16-LE). Output is also Unicode UTF16-LE. 72 – NFKC - Compatibility Decomposition, followed by Canonical Composition Applies a Unicode NFKC - Compatibility Decomposition, followed by Canonical Composition transformation to incoming Unicode text (UTF16-LE). Output is also Unicode UTF16-LE. 73 – Decompose 74 – Compose Applies a Unicode Compose transformation to incoming Unicode text (UTF16-LE). The output is also Unicode UTF16-LE. 75 – Convert numeric HTML Entities to text This filter converts decimal/hex numeric HTML/XML entities to plain text. For example: ® → ® ® → ® Typically, the input file is ANSI (single byte) format. This filter will output UTF-8 characters for high-value entities e.g. ᠀ The best approach is to first convert the file from ANSI to UTF-8 (/unicode), then apply this filter. 76 – Convert PDF documents to (UTF8) text This filter takes ALL incoming documents and converts them from PDF to text. Most of the formatting will be lost. 77 – Restrict to ANSI files 78 – Restrict to Unicode UTF16 files 79 – Restrict to Unicode UTF32 files 80 – Convert Excel spreadsheets to (UTF8) text This filter takes ALL incoming documents, opens them with Microsoft Excel, and outputs them as CSV (comma-delimited) files (hidden worksheets will be ignored). After running this filter, you can add search and replace filters or any other filters you choose, such as convert the data to Tab-delimited or XML. This filter requires Microsoft Excel 98 or higher to be installed. If you wish to convert documents other than the default .XLS files, you may also need to install Excel's conversion filters. Unless you know that all documents being processed are Excel documents (e.g. by using a wildcard of *.xls in the Files to Process tab), you should restrict (/simple=58) this filter to only files matching the pattern \.XLS$ 81 - Shred file 82 - Unicode to escaped ASCII 83 - Restrict to Unicode files 84 - T-filter The T-Filter allows you to process the same output in multiple ways. You can create a subfilter, and add filters to create the desired output. When this side of the T has finished processing, the data is discarded and the original text continues processing as though the T-filter did not exist. 85 - Convert decimal/hex numeric HTML/XML entities and entity names to text (i.e., ® -> ®, or ® -> ®). This filter outputs UTF-8 characters for high-value entities. 86 - Convert JSON to Tab 87 - Convert Tab to JSON 88 - Convert Word documents to RTF
/Simple has some redaction filters which are designed to work inside restriction filters.
89 - Remove diacritics 91 - Redact x-over text 92 - Redact x-over digits 93 - Redact x-over all but last 4 digits 94 - Redact x-over non-blanks 95 - Replace with blanks 96 - Redact with pseudo NHS 97 - Redact with pseudo SSN 98 - Redact with pseudo bank number |
Sort text files. Type - the sort type 0 - ANSI sort 1 - ANSI sort (case sensitive) 2 - ASCII sort 3 - ASCII sort (case sensitive) 4 - Numeric sort 5 - Sort by length of line 6 - sort by date and time 7 - sort by date 8 - sort by time 9 - UTF8 sort (case insensitive) 10 - UTF8 sort (case sensitive) Reverse - If 1, sort in descending order; if 0, sort in ascending order RemoveDuplicates - If 1, remove duplicate lines; if 0 keep duplicate lines StartColumn - The column in the line to begin the comparisons Length - The length of the comparison |
Adds a split type filter. The arguments are:
type: 0 Split at a given size 1 Split at a given character 2 Split at a given number of lines
splitSize - the size file to split at
splitChar - the character to split at
splitCharPos 0 Split before the character (it goes into the next file) 1 Split after the character (it remains in the first file) 2 Split on top of the character (remove it)
SplitCharCount - the number of times to see SplitChar before splitting
SplitLines - (optional) split after a given number of lines, default 60
SplitFilename - (optional) the name to give to each output split file. /split will append a "%3.3d" format specifier to the name; i.e. SplitFilename of "foo.txt" will generate output files named "foo.txt.000", "foo.txt.001", etc. If you don't specify a SplitFilename, /split will use the input filename as the base.
FirstFileNumber - (optional) the number of the first file; default is 0
PreventOverload - (optional) true to prevent more than 10,000 files in one folder, default false
The split file filter will remove the last file if it is empty. |
Add a string-type filter. The arguments are:
type: 0 - Add left margin 1 - Add header 2 - Add footer 3 - Add right margin 4 - Remove lines that match exactly 5 - Retain lines that match exactly 6 - Remove lines matching the Perl pattern 7 - Retain lines matching the Perl pattern 8 - Add text side by side 9 - Add repeating text side by side 10 - Not Used 11 - Not Used 12 - XSLT transform 13 - Restrict to lines from list 14 - Restrict to lines NOT in list 15 - Restrict to lines matching the Perl pattern 16 - Restrict to lines NOT matching the Perl pattern 17 - Restrict to filenames patching the Perl pattern 18 - Restrict to filenames NOT matching the Perl pattern
matchCase - case sensitive or not (where appropriate)
string - the string to use |
Add a tail type filter (includes or excludes text at the end of the file). The arguments are:
Exclude: 0 - Include the text 1 - Exclude the text
LinesOrBytes: 0 - Measure in lines 1 - Measure in bytes
Count - the number of lines or bytes to include or exclude |
Convert the file to or from Unicode. input is the encoding for the input file; output is the encoding for the output file. The possible values are:
UTF-16LE UTF-16BE UTF-32LE UTF-32BE UTF-8 ANSI ASCII CPnnn, where nnn is a Windows code page (for example, CP437 or CP1251).
TPIPE handles files internally as UTF-8, so if you want to process a Windows UTF-16LE file, you'll need to convert it to UTF-8 first, then apply the desired filters, and convert it back to UTF-16LE. For example, to wrap a Unicode file at column 80:
tpipe /input=inputname /output=outputname /unicode=UTF-16LE,UTF-8 /number=2,80 /unicode=UTF-8,UTF-16LE |
Adds an HTML / XML filter. The arguments are:
Type - the operation to perform: 0 restrict to an element 1 restrict to an attribute 2 restrict to between tags
IncludeText - whether to include the find string in the restriction result (default false)
IncludeQuotes - whether to include surrounding quotes in the attribute result or not (default false)
MatchCase - match case exactly or not (default false)
BufferSize - the maximum expected size of the match (default 32768)
Tag - the element or start tag to find
Attribute - the attribute to find
EndTag - the endTag to find |