Done faster CDD

Roedy · Aug 9, 2008

I have been using the CDD command since the 4DOS days. The implementatation is very simple, a flat file with a list of all the fully qualified names on the directory. A simple scan in memory could rapidly find the candidates.

However, CDD on my order-of-magnitude-faster disk now with many orders of magnitude more files, is slow to the point of being annoying.

CDD needs an overhaul. Here is what I suggest.

1. keep the flat file around for legacy compatibility. It is nice to have such a list kept up to date for manual scanning.

2. build a TreeMap (see http://mindprod.com/jgloss/treemap.html) to each directory segment name(a name used anywhere as a leg of a fully qualified directory name) which indexes the fully qualified directory name entries that use it. Use that index to help you rapidly find all the candidate directories that might match some wildcard. So for example you could do
cdd *\image\* You would treat cdd image as if it were cdd *\image*

3. Just think what sorts of auxiliary data structures might help you do the various searches you do now.

4. to conserve RAM, intern (see http://mindprod.com/jgloss/interned.html) each of the legs so there is only one copy of the string in RAM.

I would gladly give up some of the flexibility of the current search scheme for extra speed. You might leave the code for the current scheme intact, for people who think otherwise and would like to configure CDD to work as it does now.

Roedy · Aug 9, 2008

Roedy said:
CDD needs an overhaul. Here is what I suggest.

I tend to think in Java terms. Presumably CDD is written in Assembler or C, a gradual evolution from the early DOS days.

What you might be a better fit that a TreeMap is use a traditional SQL database. There are free ones, tiny ones, embeddable ones. See http://mindprod.com/jgloss/sqlvendors.html

You let SQL do the work of satisfying your query and caching just parts of the entire database. You have a table with lookup by leg to get the fully qualify name record.

Another advantage of this approach is SQL is naturally multithread. You would not have problems with mulitple tasks competing for access to the CDD database. I find it highly annoying to be locked out for long periods of time to rescan for missing entries. That process would not interfere with use of the database were it stored SQL style. Rescan could even run continuously in the background working slowly so as not to put much load on the CPU or disk. You would only need to invoke rescan if you were in a hurry to have the database up to date. All rescan would do is prod the background task to stop sleeping between updates for one cycle.

Roedy · Aug 10, 2008

Roedy said:
You let SQL do the work of satisfying your query and caching just parts of the entire database. You have a table with lookup by leg to get the fully qualify name record.

If you went to all that work, it would be just another small effort to create a database of filenames too. Then you could implement a Linux-like locate command to rapidly find a file given just its name or part of its name.

David Marcus · Sep 6, 2008

Roedy said:
However, CDD on my order-of-magnitude-faster disk now with many orders of magnitude more files, is slow to the point of being annoying.

How large is your jpstree.idx file? If you are using Vista, try cdd /nj /s.

Search

Welcome!

Done faster CDD

Roedy

Roedy

Roedy

David Marcus

Similar threads