Welcome!

By registering with us, you'll be able to discuss, share and private message with other members of our community.

SignUp Now!

"grep" using only internals

May
13,190
180
In another thread I mentioned

Code:
d:\data\tcclibrary> alias mygrep
ffind /v /k /m /e%1

for finding strings matching a regex in a file or a pipe. It's pretty fast; it lacks the ability to invert the search (return lines that don't match); and you can only specify case-insensitivity in the regex itself (with (?i)).

Here's another (source far below), TGREP, which uses TPIPE. It's written as a library function but could could be used as a BTM after commenting/removing the first and last lines. It allows inverting the search, specifying case-insensitivity outside the regex, and it's slow.

Code:
d:\data\tcclibrary> tgrep /?

Syntax: TGREP [/i (insensitive)] [/v (invert)] regex [filename]

Here are two speed comparisons, in/not in a pipe.

Code:
d:\data\tcclibrary> timer /q & (echo foo | mygrep "(?i)FOO") & timer
foo
Timer 1 off: 11:22:48  Elapsed: 0:00:00.089

d:\data\tcclibrary> timer /q & (echo foo | tgrep /i "FOO") & timer
foo
Timer 1 off: 11:22:53  Elapsed: 0:00:00.419

d:\data\tcclibrary> timer /q & (mygrep "Fire|Prox" %_ininame) & timer
ProxyPort=80
FirewallType=None
FirewallPort=0
Timer 1 off: 11:33:40  Elapsed: 0:00:00.007

d:\data\tcclibrary> timer /q & (tgrep "Fire|Prox" %_ininame) & timer
ProxyPort=80
FirewallType=None
FirewallPort=0
Timer 1 off: 11:33:41  Elapsed: 0:00:00.337

And here's TGREP_SRC.BTM. Suggestions for improvement will be welcomed.

Code:
tgrep {
setlocal

if "%1" == "" .or. "%1" == "/?" ^
    (echo ^r^nSyntax: TGREP [/i (insensitive)] [/v (invert)] regex [filename] & quit)

set case=1
set type=3

if "%1" == "/i" (set case=0 & shift)
if "%1" == "/v" (set type=4 & shift)
if "%1" == "/i" (set case=0 & shift)

if "%1" != ""   (set regex=%1) else (echoerr tgrep: Missing regex & quit)

iff %_pipe == 0 then
    if "%2" == ""   (echoerr tgrep: No file specified & quit)
    if not exist %2 (echoerr tgrep: The specified file does not exist (%2) & quit)
    set filespec=/input=%2
endiff

tpipe %filespec /grep=%type,0,0,%case,0,0,0,0,%regex
}
 
out of curiosity, what happens if you use wsl grep?
If the LsxxManager service isn't running, it's started. If vmwp.exe, vmem, wslhost.exe (and its conhost.exe), and a couple dllhosts.exes aren't running they're started and it will take a long time (over 3 seconds here).

If unused, those six processes will hang around for about a minute. If they're still running, what happens is odd. If wsl is re-used within ~15 seconds it takes about mygrep's time. After ~15 seconds (but before those six processes die) it will take about TGREP's time.

I wouldn't mind hearing the results of further experiments.
 
To each their own, but I've been using the Cygwin tools for several years. I very seldom use wsl, when I need a full unix (bash, etc) I have Ubuntu running in a VM. Anything short of a full unix session, I find Cygwin fits the bill very nicely.

And there is always busybox as well if you don't want to fool with Cygwin.
 
To each their own, but I've been using the Cygwin tools for several years. I very seldom use wsl, when I need a full unix (bash, etc) I have Ubuntu running in a VM. Anything short of a full unix session, I find Cygwin fits the bill very nicely.

In the Cygwin package summary for grep it says

1685116563365.png

Is that really true ... it needs bash? Can't you call/pipe Cygwin's grep from another shell?
 
Works okay for me;
Code:
C:\...\bin>ver

TCC  30.00.18 x64   Windows 10 [Version 10.0.19044.2965]

C:\...\bin>which grep
grep is an external : C:\cygwin64\bin\grep.exe

C:\...\bin>echo Hello | grep h

C:\...\bin>echo Hello | grep H
Hello

Joe
 
Works fine with busybox.exe;
Code:
C:\...\bin>ver

TCC  30.00.18 x64   Windows 10 [Version 10.0.19044.2965]

C:\...\bin>which busybox.exe
busybox.exe is an external : C:\ProgramData\chocolatey\bin\busybox.exe

C:\...\bin>echo Hello | busybox.exe grep h

C:\...\bin>echo Hello | busybox.exe grep H
Hello

Joe
 
Hmmm! They look different, though. It looks like with busybox, you're explicitly running an emulator (as with WSL?). With Cygwin it looks like you're just running a Windows EXE. Can you say what version (grep --version) and whether it's 64-bit or not (@EXETYPE). Cygwin says it's Gnu grep. AFAIK, Gnu never made a 64-bit Windows grep; I'd be a little surprised if Cygwin did. My grep is an oldie, and a 32-bit app.

Code:
v:\> grep --version | grep -E "2.5|Copy"
GNU grep 2.5.4
Copyright (C) 2009 Free Software Foundation, Inc.

v:\> echo %@exetype[d:\gnu\grep.exe]
7
 
JPSoft's @exetype returns "10" for Cygwin grep.exe, meaning "Windows x64 console".

And the version is:
grep (GNU grep) 3.10
Packaged by Cygwin (3.10-1)
Copyright (C) 2023 Free Software Foundation, Inc.

Busybox is NOT an emulator. From the web page:

BusyBox combines tiny versions of many common UNIX utilities into a single small executable. It provides replacements for most of the utilities you usually find in GNU fileutils, shellutils, etc. The utilities in BusyBox generally have fewer options than their full-featured GNU cousins; however,the options that are included provide the expected functionality and behave very much like their GNU counterparts.

All cygwin executables require the presence of the Cygwin dll. This is the translation layer. Busybox is self contained, completely portable. And busybox is available in 32 bit and 64 bit versions, if that matters to you.
 
Thanks for that, @ohenryx. I get it. How big is busybox.exe? Whose grep is in there (their own?); and does it do alternation ("pattern1|pattern2")? Have you compared its speed to anything else.

I installed Cygwin but after taking what I want, I'll uninstall it. This is what I have so far (copied from cygwin\bin).

Code:
d:\c32> d
2023-05-21  12:15          46,611  cmp.exe
2022-07-14  15:26          45,075  cut.exe
2022-05-02  07:31          75,283  cyggcc_s-seh-1.dll
2022-05-23  07:23       1,088,019  cygiconv-2.dll
2022-11-18  08:49          44,563  cygintl-8.dll
2022-12-18  12:27         634,387  cygpcre2-8-0.dll
2023-02-14  09:25       2,953,269  cygwin1.dll
2023-05-21  12:15         211,987  diff.exe
2023-03-25  16:21         213,523  grep.exe
2022-04-08  23:35         104,467  gzip.exe
2022-07-14  15:26          45,587  head.exe
2022-11-13  07:20         176,659  sed.exe
2022-07-14  15:27         109,075  sort.exe
2022-07-14  15:27          61,459  tail.exe
2022-07-14  15:27          38,931  tee.exe
2022-07-14  15:27          51,219  tr.exe
2022-07-14  15:27          45,587  uniq.exe
2022-07-14  15:27          48,659  wc.exe

Those tools are all Gnu and 64-bit. My 13 year-old 32-bit Gnu tools are measurably faster
 
Your list of utilities from Cygwin is pretty much what I use from Cygwin, but I keep a lot more "just in case". It only takes up a little space on the hard drive, and I have a LOT of hard drive space available. Just last year I had a sudden need for "tclsh" to run some code that I came across, and there it was, already installed under Cygwin.

32 bit executables are often faster than 64 bit, but I don't recall ever encountering a situation where the difference was enough that I would care.

And here are a few more executables from cygwin, some of which I use daily, others less often but do use:

awk.exe
conv.exe
less.exe
par.exe
tr.exe
xargs.exe
 
Code:
C:\...\bin>ver

TCC  30.00.18 x64   Windows 10 [Version 10.0.19044.2965]

C:\...\bin>file.exe grep.exe
grep.exe: PE32+ executable (console) x86-64, for MS Windows, 11 sections

C:\...\bin>echo %@exetype[grep.exe]
10

C:\...\bin>dir grep.exe

 Volume in drive C is unlabeled    Serial number is acb2:6a48
 Directory of  C:\cygwin64\bin\grep.exe

2023-03-25  16:21         213,523  grep.exe

Code:
busybox.exe grep --help

Usage: grep [-HhnlLoqvsrRiwFE] [-m N] [-A|B|C N] { PATTERN | -e PATTERN... | -f FILE... } [FILE]...

Search for PATTERN in FILEs (or stdin)

        -H      Add 'filename:' prefix
        -h      Do not add 'filename:' prefix
        -n      Add 'line_no:' prefix
        -l      Show only names of files that match
        -L      Show only names of files that don't match
        -c      Show only count of matching lines
        -o      Show only the matching part of line
        -q      Quiet. Return 0 if PATTERN is found, 1 otherwise
        -v      Select non-matching lines
        -s      Suppress open and read errors
        -r      Recurse
        -R      Recurse and dereference symlinks
        -i      Ignore case
        -w      Match whole words only
        -x      Match whole lines only
        -F      PATTERN is a literal (not regexp)
        -E      PATTERN is an extended regexp
        -m N    Match up to N times per file
        -A N    Print N lines of trailing context
        -B N    Print N lines of leading context
        -C N    Same as '-A N -B N'
        -e PTRN Pattern to match
        -f FILE Read pattern from file

C:\...\bin>echo %@filesize[busybox.exe]
37376

Joe
 
Your version of busybox is much smaller than mine. I have version 1.37.0, the 64 bit version is 658k. For the life of me I can't recall where I downloaded it from.

Screenshot 2023-05-26 162338.png


ON EDIT: Found it.
 
I installed BusyBox via Chocolately.

I also have the 64 bit version;
Code:
C:\...\tools>dir busybox*.exe

 Volume in drive C is unlabeled    Serial number is acb2:6a48
 Directory of  C:\ProgramData\chocolatey\lib\busybox\tools\busybox*.exe

2023-02-26  21:17         617,486  busybox.exe
2023-02-26  21:17         658,432  busybox64.exe
           1,275,918 bytes in 2 files and 0 dirs    1,277,952 bytes allocated
     307,517,014,016 bytes free

C:\...\tools>file busybox64.exe
busybox64.exe: PE32+ executable (console) x86-64 (stripped to external PDB), for MS Windows

1685140828997.png


Joe
 
Back
Top
[FOX] Ultimate Translator
Translate