GLOBAL command is very slow

Feb 1, 2010
38
0
#1
I have a project with a very large directory tree. Let's say I want to get a list of directories where there are *.obj files.
Code:
echo %time & global /i/q (ffind /u *.obj > NUL & if 0 == %_? (echo %@full[.])) & echo %time
11:03:12,37
<< skipped >>
11:26:07,05
I.e. it takes TCC 23 minutes to do.

An equivalent command in PowerShell takes just 30 seconds:
Code:
Measure-Command { Get-ChildItem -Recurse -Directory | ForEach-Object { If (Join-Path -ChildPath *.obj -Path $_.FullName | Test-Path) { echo $_.FullName } } }


Days              : 0
Hours             : 0
Minutes           : 0
Seconds           : 29
Milliseconds      : 677
In Linux in the same tree on the same PC (on a virtual machine) a bash equivalent takes about one and half minutes:
Code:
$ time find -type d -print0 | xargs -0 -n 1 -I{} bash -c 'if [[ -n "$(find '"'"'{}'"'"' -maxdepth 1 -name '"'"'*.obj'"'"' -print -quit)" ]]; then echo {}; fi'
./tools/win32/msvc90/lib

real    1m36.002s
user    0m16.651s
sys     1m25.930s
It takes a bit longer than PowerShell, but Linux after all runs three processes for each directory and the hard drive is virtual. However it is still 15 times faster than TCC.

I experimented with the latest version of TCC, all files are located on a local hard drive.
 
Last edited:
#8
FYI: I have several batch programs which use GLOBAL to simplify traversing directory trees. More elaborate (and sometimes just more recent) syntax could generally be used instead of GLOBAL. Most of these are in assistance of cleanup, reorganizing, or other like tasks, where I don't mind the slower but more precisely controlled operation.
 
Feb 1, 2010
38
0
#9
GLOBAL is intended to be used with external commands that don't have a built-in way of traversing subdirectories.
Thank you. Yes, that was my purpose: I wanted to run p4 fstat on every directory that has .obj files. But since it wasn't the reason of the slowdown, I didn't mention it in the example. My final one-liner is
Code:
global /i/q (if exist *.obj (if "%@execstr[p4 fstat %_cwd\... 2>&1]" =~ ".* - no such file\(s\)\." (echo %_cwd & del /t/s/x/y/z %_cwd)))
It works pretty fast. It seems like ffind was the bottleneck, not global, as I initially assumed.
 

rconn

Administrator
Staff member
May 14, 2008
10,579
97
#10
FYI: I have several batch programs which use GLOBAL to simplify traversing directory trees. More elaborate (and sometimes just more recent) syntax could generally be used instead of GLOBAL.
Most of the file handling commands and all of the looping commands have a /S option. IMO that's not "more elaborate" than using GLOBAL. (And /S is going to be a lot faster.)
 
Feb 1, 2010
38
0
#11
PS ver 2 ... it didn't like "-Directory".
Yes, -Directory is a PS 3 feature. Try this instead:
Code:
Measure-Command { Get-ChildItem -Recurse  | ?{ $_.PSIsContainer } | ForEach-Object { If (Join-Path -ChildPath *.obj -Path $_.FullName | Test-Path) { echo $_.FullName } } }
Though in my experiments using a filter instead of -Directory makes it 50% slower.
 
#12
Early on I replaced your question with counting the directories on my system drive which contain EXE files (below, remove "| wc -l"" if you want the directories themselves). This led to all sorts of access problems. All of these below were run elevated. SORT in the MS one; GREP, SED, UNIQ, and WC are common (?) UNIX utilities. The quickest (by far, and the only result which I have any confidence in) was this:
Code:
c:\> timer & (dir /s /b /a *.exe | grep -i \.exe$ | sed -e "s/\(.*\\\)[^\\]*\.[eE][xX][eE]$/\1/g" | sort | uniq | wc -l) & timer
Timer 1 on: 16:21:27
  862
Timer 1 off: 16:21:35  Elapsed: 0:00:07.94
It's even faster if I let CMD do the DIR command:
Code:
c:\> timer & (cmd /c dir /s /b /a *.exe | grep -i \.exe$ | sed -e "s/\(.*\\\)[^\\]*\.[eE][xX][eE]$/\1/g" | sort | uniq | wc -l) & timer
Timer 1 on: 16:20:52
  862
Timer 1 off: 16:20:58  Elapsed: 0:00:05.31
Patulus, your recently suggested PowerShell command worked, took 46 seconds, and found 803 directories (with 2 access errors).

In my experiment (on the system drive), GLOBAL, DO /S, and FOR /R all have quirks which prevent them from coming up with a result that was even close to correct.
 
#13
Vince:
My interest is in enumerating all files anywhere on a volume (but not directories). While I would prefer to have a single pass, reporting file sizes, inodes, link counts, and timestamps, accuracy is more important than speed. I would preferably use a single invocation of PDIR with all embellishment, but is it reliable?
 
#15
Vince:
My interest is in enumerating all files anywhere on a volume (but not directories). While I would prefer to have a single pass, reporting file sizes, inodes, link counts, and timestamps, accuracy is more important than speed. I would preferably use a single invocation of PDIR with all embellishment, but is it reliable?
DIR seems to be the best on a drive where there may be permissions problems. I don't know if PDIR uses exactly the same recursion logic as DIR. If I were you, I'd experiment, comparing just the counts from PDIR and DIR. Once satisfied that everything was being processed, worry about all the additional info you want. And if you're not plagued by permission problems and the S attribute, compare results to those of GLOBAL, DO /S, and FOR /R for speed and thoroughness.
 
#16
DIR seems to be the best on a drive where there may be permissions problems. I don't know if PDIR uses exactly the same recursion logic as DIR. If I were you, I'd experiment, comparing just the counts from PDIR and DIR. Once satisfied that everything was being processed, worry about all the additional info you want. And if you're not plagued by permission problems and the S attribute, compare results to those of GLOBAL, DO /S, and FOR /R for speed and thoroughness.
Steve, in my tests, PDIR is as thorough and as fast as DIR. Compare the test below to the one using DIR a couple of posts back. You ought to be able to get the other info you want and rely on its thoroughness.
Code:
c:\> timer & (pdir /s /b /a *.exe | grep -i \.exe$ | sed -e "s/\(.*\\\)[^\\]*\.[
eE][xX][eE]$/\1/g" | sort | g:\gnu\uniq | g:\gnu\wc -l) & timer
Timer 1 on: 21:55:22
  862
Timer 1 off: 21:55:30  Elapsed: 0:00:07.91
 
Feb 1, 2010
38
0
#19
Early on I replaced your question with counting the directories on my system drive which contain EXE files (below, remove "| wc -l"" if you want the directories themselves).
Thank you! It's great to know that a simple dir is so fast. Yeah, it seems like Linux command line utilities are ubiquitous. But my goal is to run an external command on directories that contain *.obj. In Linux I would use xagrs. What would you do in TCMD?
 
#20
Thank you! It's great to know that a simple dir is so fast. Yeah, it seems like Linux command line utilities are ubiquitous. But my goal is to run an external command on directories that contain *.obj. In Linux I would use xagrs. What would you do in TCMD?
If you can limit the search to a drive or subdirectory tree where permissions and the "S" attribute won't be a problem, I'd say to use something like

Code:
global if exist *.obj external_command ...
It may not be the fastest, but it won't be bad (and it's very straightforward). You can use %_cwd to build arguments to external_command.

I suppose you could build a list of directories (by one of the methods discussed), put it in a file ( > dirlist.txt ) and then process the file line by line.
Code:
do line in @dirlist.txt ( external_command ... )
I suppose you could use DO to process the list-building output line by line ...
Code:
DO line in /P list_building_command ( external_command ... )
I'm not familiar with xargs. I'm guessing it would be used to construct a single (possibly long) command line for external_command. Once you have the list of directories you can do with it whatever you want, including build a single command line ... up to 32K/64K characters (before/after expansion).
 
Feb 1, 2010
38
0
#21
If you can limit the search to a drive or subdirectory tree where permissions and the "S" attribute won't be a problem, I'd say to use something like

Code:
global if exist *.obj external_command ...
That's what I use. See message #9 in this thread. I meant what would you do with your approach instead of global.

I'm not familiar with xargs.
See a Linux example in the first message in this thread. It shows how xargs works.
 
Feb 1, 2010
38
0
#23
... but without the word xargs anywhere!
I'm confused. Here's the example from the first message:
Code:
$ time find -type d -print0 | xargs -0 -n 1 -I{} bash -c 'if [[ -n "$(find '"'"'{}'"'"' -maxdepth 1 -name '"'"'*.obj'"'"' -print -quit)" ]]; then echo {}; fi'
./tools/win32/msvc90/lib

real    1m36.002s
user    0m16.651s
sys     1m25.930s
You still don't see xargs there?
 
#24
That's what I use. See message #9 in this thread. I meant what would you do with your approach instead of global.

See a Linux example in the first message in this thread. It shows how xargs works.
I'm not sure what you're asking. You can build a list of directories to process fairly quickly with DIR. You can then process the list as you see fit.

Apparently, xargs starts a bash process for every directory found. TCC's DO /P can do that for you.
 
Feb 1, 2010
38
0
#25
I'm not sure what you're asking. You can build a list of directories to process fairly quickly with DIR. You can then process the list as you see fit.
I'm already not asking anything. You gave your answer. I just made a remark that from your first suggestion followed that you didn't read message #9. I wanted to point out that what you were suggesting I'm already using and that wasn't my question. However you answered my question with your subsequent ideas. Is it clear now?

Apparently, xargs starts a bash process for every directory found.
And? It's still very, very fast. So I don't understand why you're mentioning this. Can you elaborate?
 
#26
I'm confused. Here's the example from the first message:
Code:
$ time find -type d -print0 | xargs -0 -n 1 -I{} bash -c 'if [[ -n "$(find '"'"'{}'"'"' -maxdepth 1 -name '"'"'*.obj'"'"' -print -quit)" ]]; then echo {}; fi'
./tools/win32/msvc90/lib

real    1m36.002s
user    0m16.651s
sys     1m25.930s
You still don't see xargs there?
You said that it occurs in post #9. It does not. In fact it does occur in #1... Regardless, it does not explain how xargs works. However. generally there is a major trade-off between TCC style operation and POSIX-style ones. TCC is optimized for sequences of complex, individual commands. Cf. POSIX, which is optimized for many elementary, concurrent pipes chained together.
 
Feb 1, 2010
38
0
#27
You said that it occurs in post #9. It does not.
I'm completely lost. What does occur in post #9?

Regardless, it does not explain how xargs works.
It doesn't explain, it shows an example, which I thought is easy to understand. xargs parses stdin and uses it as an argument to some command. For example
Code:
$ echo *.bat | xargs ls
prints names of .bat files.
 
Last edited:
#28
Sorry, I did get confused between the two parts of message #21, referencing both #9 and #1. However, for those of us who left POSIX long behind for the greener pastures of TCC, only the most recent post (#27) explained xargs, In general, one should not expect TCC users to be familiar with other products. esp. from a different OS.
 
Feb 1, 2010
38
0
#29
In general, one should not expect TCC users to be familiar with other products. esp. from a different OS.
I absolutely agree. Why are you telling this to me? I merely mentioned that I would use xargs in Linux and asked another user how he/she would solve a particular problem in TCC. The description of the problem was: “my goal is to run an external command on directories that contain *.obj”. Where in this description you found an expectation of “someone is to be familiar with other products”? Rephrasing your request I would say: “in general, one should carefully read other people messages before telling them what to do”. Or should ask questions whether he understood correctly. :shrug: