How to? filter text stream with a regular expression

Feb 23, 2012
240
3
I'm looking for a way to do grep-type filtering on a text stream in TCC. For instance, I'd like to filter out lines of a given file with a string of 5 or more digits.
With powershell I can use the "match" function:
type filename.txt | where {$_ -match "\d{5}"}
Is there an equivalent within TCC? I've already seen that TCC does have excellent regex support. For instance, to perform a "dir" of filenames with strings of 5 or more digits, I can write:
dir "::\d{5}"
However, I'd like to be able to harness TCC's regex processing with any piped text stream on the command line. Is this possible?
 
Feb 23, 2012
240
3
Hi Steve,

Thanks for the pointer! I just tried out a few ffind combos, and I was pleased to find that it supported UTF-8 with the /8 option, and that it also supports use as a pipe command. That is, one can write, for instance:

dir | ffind /e"\d{5} /v

(OK, I realize that in this case I could have just added the regex to the dir command directly, but I note it here for demonstration purposes).

However, I've noticed two issues regarding ffind and multilanguage text:

1] When I pipe text into ffind, any non-English text becomes corrupted in the final output, whether or not I set the "/8" switch. I imagine that this is because piped command-line output is not piped as UTF-8. So, I'm wondering - just as Rex recently added the option for all redirected streams to be processed as UTF-8, is there a parallel option for piped streams to be processed as UTF-8, too?

2] Even without the piping, I find that ffind does not properly process regex strings that contain non-English characters. This is strange, because it does properly process simple search strings with non-English chars. Here's my case:

- I have a file, text.txt, with UTF-8 text, containing English and Hebrew characters.
- If I run:
ffind /t"א" /v /8 test.txt (that's an "aleph" character in the /t argument)
Then the result is correct - all lines are displayed containing the character "aleph".
- However, if I run:
ffind /e"א" /v /8 test.txt
Then the results are now blank!
Why would /t process it correctly, but /e not do so?
 

Charles Dye

Super Moderator
Staff member
May 20, 2008
4,461
88
Albuquerque, NM
prospero.unm.edu
1] When I pipe text into ffind, any non-English text becomes corrupted in the final output, whether or not I set the "/8" switch. I imagine that this is because piped command-line output is not piped as UTF-8. So, I'm wondering - just as Rex recently added the option for all redirected streams to be processed as UTF-8, is there a parallel option for piped streams to be processed as UTF-8, too?

Is your input text Unicode, or is it some OEM format?
 
Feb 23, 2012
240
3
1] In my second example, my sample file is encoded in UTF-8.
2] In my first example, regarding piping, I'm simply piping the output of the dir command to ffind. Admittedly, the code page of my shell is set to 1255 (Hebrew-Windows), rather than 65001 (UTF-8), because, as I've noted in a different thread, when I switch TCC to code page 65001 I don't see any Hebrew characters whatsoever. Nevertheless, I was hoping that piped text could be converted "on the fly" to UTF-8, the same way redirected output (with >) is successfully converted to UTF-8.
 
Why would /t process it correctly, but /e not do so?
At a guess (you'll have to wait for Rex to provide the definitive answer) because they are processed by completely different pieces of code, with the /t handled directly by TCC and the /e passed to the third-party REGEX code to handle. As to the rest of your post, I'm afraid that I've no ideas because I'm lucky enough to be able to run vanilla Take Command in vanilla Windows and have no need of any additional language or code page support (or at least, the UK pages are rarely different enough from the American ones to cause any real issues these days), so I've never had the need to dig into those areas, sorry.
 
Feb 23, 2012
240
3
Well, Charles, as I noted, ffind does a great job with UTF-8 text with the /T parameters. And I'm using it with the /8 switch, which puts it into UTF-8 mode. So there is something specific about the way the search string and file are sent to the regex processor that seems to be the problem.
 

rconn

Administrator
Staff member
May 14, 2008
12,344
149
Well, Charles, as I noted, ffind does a great job with UTF-8 text with the /T parameters. And I'm using it with the /8 switch, which puts it into UTF-8 mode. So there is something specific about the way the search string and file are sent to the regex processor that seems to be the problem.

The RE library (Oniguruma) has to be configured to handle anything other than ASCII and Unicode input. Adding UTF-8 shouldn't be too difficult, but it's going to be substantially more work to configure it for other encodings (like RTL languages). That definitely won't be in v13.
 
Feb 23, 2012
240
3
Hi Rex,
1] Ah, I see, you are correct, it's the UTF-8 that was scaring the regular expression library. When I went back to "UnicodeOutput=Yes", then I found that piping text through to ffind's regular expression parser worked perfectly. That is, I can now write:
dir | ffind /e"א" /v
With UTF-16, this succeeds. With UTF-8, it found nothing.

2] Interestingly, this issue affected ffind's non-regex string processing too. That is, typing the same thing but with /t, like this:
dir | ffind /t"א" /v
resulted in the same issue. With UTF-8, it finds nothing (even if I add the /8 switch). On the other hand, with UTF-16 output, the output is all good.
 
Feb 23, 2012
240
3
So, just to clarify, because there are a lot of variables. I tried:
(a) piping Hebrew text into ffind
(b) running ffind on a UTF-8 text file.
And I tried each of these with both (1) regex and (2) non-regex strings.
I found that:
(1a and 1b) with regex, UTF-8 was not processed correctly, neither with piping nor on a text file
(2a) When piping UTF-8, a non-regex string did not work, either
(2b) However, running ffind on a UTF-8 file with a non-regex string did work.

With UTF-16, all four permutations work.
 
Similar threads
Thread starter Title Forum Replies Date
MikeBaas How to? Filter file-contents based on "IF EXIST" Support 6
A How to? Filter history list with unicode chars Support 0
H TPIPE: /FILTER and /OUTPUT unexpected incompatibility Support 6
A How to? Filter a list by numeric number within filename Support 28
scottb Toolbar filter by directory attribute /A:D fails Support 1
vefatica Tab (^t) in @FILEWRITE's text Support 0
R Bug TPIPE's pdf to text conversions don't work Support 2
Joe Caverly Using TYPE with non-English text Support 22
M Goto fails when a text endtext block precedes the code Support 5
Joe Caverly No blank line in TEXT...ENDTEXT in a LIBRARY function Support 8
S INPUT fails if the entered text contains pair of square brackets Support 6
Alpengreis List command: text truncated in find box (german) Support 9
vefatica Text invisible in v25 Support 10
S How to? What's the maximum size of the thread text ? Support 5
E Text copy bug Support 1
S FFIND text that includes " Support 7
S Setting Tabs=Bottom makes text turn black Support 6
Joe Caverly Multiple Text Searches at once using FFIND or TPIPE Support 4
AndrewJ TakeCommand v23 + ANSI color sequences leads to black on black text Support 6
D Grabbing html text with @line Support 3
M TCC screen text bright colors not as bright in 24 as 23 Support 5
vefatica Find Files/Text dialog: does it work? Support 0
Alpengreis Installer: text is still truncated in german language Support 2
Peter Murschall IDE destroys Text on Ctrl-U/Ctrl-Shift-U Support 12
WinLanEm Read Cyrillic text from a file Support 12
Charles Dye OSD loses ampersands in text Support 2
R How to? Display text same as in CMD Support 14
Glenn Bowes Strange text at startup Support 5
M Fixed Cannot use the "Browse..." function in "Find files/text" dialogbox Support 2
Peter Murschall WAD FOR reads Text in ASCII !??!? Support 7
B Documentation Reference/Windows X64: Redundant text at the end Support 0
Joe Caverly LIST /T (search for Text) Support 2
nikbackm How to? Find duplicate lines in text file Support 0
J Input text converts to uppercase Support 1
T How to? Select and Delete text Support 6
vhodro How to? Select text Support 10
D Highlighted Text Color? Support 2
vefatica Fixed No text in List View! Support 4
D Removing text with TPIPE Support 2
Frank locked file after "text > ..." Support 3
Joe Caverly Copying text of MSGBOX Support 0
Stefano Piccardi TPIPE and word to text conversion Support 4
Charles Dye Text selection in Take Command tab window Support 13
rconn Full text instant email notification Support 31
WadeHatler Any way to restore the Text Based "Select File" Windows in TCC Support 2
Jay Sage Function to Return Selected Text Support 2
mdwyer @replace - global use in a text file Support 3
E Line break in msgbox text Support 3
D Force For to treat set of values as text, not files Support 3
L command grouping with TEXT...ENDTEXT Support 1

Similar threads