How to? Filter a list by numeric number within filename

Avi Shmidman · May 2, 2012

Here's the scenario: as a digital camera user, I have directories full of files that are named according to the pattern:
IMG_0001, IMG_0002, IMG_0003, etc.
I'd now like to move chunks of these files into specific directories. For instance, I'd like to move the 146 files ranging from IMG_0504 through IMG_0649 into a specific folder. Is there any easy way to specify this range on the command line, within a move or copy command?

Steve Fabian · May 2, 2012

Sorry, no. You could use wildcards for subsets:
copy img_050[4-9];img_05[1-9][0-9];img_06[0-4][0-9]

I believe this would do what you want in this specific instance...

Avi Shmidman · May 2, 2012

Steve Fabian said:
Sorry, no. You could use wildcards for subsets:
copy img_050[4-9];img_05[1-9][0-9];img_06[0-4][0-9]

Thanks, Steve. That does indeed solve the problem, although somewhat convoluted in its syntax.
It seems to me that the issue that I've raised is becoming more and more common for people working with digital media, including still cameras, video cameras, voice recorders, smartphones, etc. which all record things serially. I'd like to add a feature request for this type of action, but first I'm trying to figure out what would really need to be done, given that each camera has its own set of prefixes and extensions. Ideally, I think, it would be a two-step process: first a regex would be supplied to instruct TCC to extract the numeric part of the filename; in this case it would be: "::img_(\d+)", where in the first backreference group signifies the numeric item. Then, the next part of the command would provide a numeric range that relates to that extracted number; e.g.: 504-649. So the full command would then look something like:
copy %@ExtractNumber["::img_(\d+)"] *.jpg /numeric_range504-649
Do you think this sort of setup would make sense?

samintz · May 2, 2012

Why not just use a counted loop?

do i=504 to 649 (copy img_0%i.jpg c:\foo\)

-Scott

Avi Shmidman · May 2, 2012

samintz said:
Why not just use a counted loop?

do i=504 to 649 (copy img_0%i.jpg c:\foo\)

-Scott

Thanks, Scott. I just tried it and it works beautifully. I haven't yet gotten used to using "do" loops on the command line, but, as you've shown, it really does make for a very elegant solution.

Charles Dye · May 2, 2012

Avi Shmidman said:
So the full command would then look something like:
copy %@ExtractNumber["::img_(\d+)"] *.jpg /numeric_range504-649
Do you think this sort of setup would make sense?

Or perhaps TCC's built-in wildcard support could be extended to match series of digits evaluating within a given range. I imagine it might look something like this:

Code:

copy img_[\504-649].jpg c:\foo\

where the [\ signals that we're matching a numeric range.

Avi Shmidman · May 2, 2012

Charles Dye said:
Code:

copy img_[\504-649].jpg c:\foo\

where the [\ signals that we're matching a numeric range.

Yes, that would be a slick way of implementing the feature. Of course, there would need to be a way to specify whether to pad with zeroes or not, and how many (for cases such as 15-125). So perhaps:
[\n504-649] would do a numeric range
[\4n504-649] would do the range and pad with zeroes to create a four-digit number.

Charles Dye · May 2, 2012

Avi Shmidman said:
Yes, that would be a slick way of implementing the feature. Of course, there would need to be a way to specify whether to pad with zeroes or not, and how many (for cases such as 15-125). So perhaps:
[\n504-649] would do a numeric range
[\4n504-649] would do the range and pad with zeroes to create a four-digit number.

Ah, but if we're matching a numeric range, leading zeroes don't change the value; I think they can be safely ignored. (If you wanted to specify that the leading zeroes must exist, you'd put them before the wildcard.)

Steve Fabian · May 2, 2012

Actually, leading zeroes in both the "FOR /L" and the "DO n =" form are useful; for instance, I use directories named V0000 ... V9999 for version storage. A minor syntax enhancement of the commands could be used: every value is to be zero-padded to the width of the start value. Probably would require a new suboption to provide backward compatibility, and to account for loops with decrementing control variables where the start value is at least an order of magnitude larger than the end value (i.e., requires more digits without leading zeros).

A different issue about images. I personally rename all pictures to be the date and time taken as part of copying them from external media. This allows me to find pictures easily. Since this requires individual handling anyway, the triggering issue of this thread is not applicable. BTW, my procedure automatically avoids copying pictures previously copied; when EXIF data is available, it is used, if not, the modification time listed for each file is used.

mathewsdw · May 3, 2012

I'm too lazy at the moment to investigate it in any detail in terms of your specific need(s), but the expression "%@Format[05,n]" will return the value of "n" (which actually doesn't even need to be numeric) in a field whose minimum width is "5" (in this example) padded on the left with leading zeroes if "n" is less than 5 characters (an "n" whose length is five characters or more is left unchanged; the expression "%@Format[05.5,n]" would truncate "n" (on the right) at 5 characters.)

Avi Shmidman · May 3, 2012

Steve Fabian said:
A different issue about images. I personally rename all pictures to be the date and time taken as part of copying them from external media. This allows me to find pictures easily. Since this requires individual handling anyway, the triggering issue of this thread is not applicable. BTW, my procedure automatically avoids copying pictures previously copied; when EXIF data is available, it is used, if not, the modification time listed for each file is used.

This sounds intriguing. Do you have any relevant scripts that you would be willing to share regarding your process?

Steve Fabian · May 3, 2012

Dan:
@FORMAT and

Steve Fabian · May 3, 2012

Sorry, my keyboard seemed to have its own mind. It posted an incomplete message.

Dan: Yes, @format and its variants @formatn and @formatnc are some of the ways to do it; for 4-digit numbers you could start your loop at 10000 and extract the four digits on the right. My suggestion was an option in counted FOR and DO loops for it to be done automatically, so the user doesn't need to do all these manipulations.

Avi:
I'll post it later - first I need to remove dependencies on my aliases and UDFs.

mfarah · May 3, 2012

Avi Shmidman said:
Here's the scenario: as a digital camera user, I have directories full of files that are named according to the pattern:
IMG_0001, IMG_0002, IMG_0003, etc.
I'd now like to move chunks of these files into specific directories. For instance, I'd like to move the 146 files ranging from IMG_0504 through IMG_0649 into a specific folder. Is there any easy way to specify this range on the command line, within a move or copy command?

I usually have to do the same. In my case, though, I'm better served by specifying a date range ("all the pictures from monday to this directory, all the ones from tuesday morning to this one, ...") or by using the SELECT command and tagging them by hand.

mathewsdw · May 3, 2012

Steve, I'm mildly confused by your answer in that the counted "For" loop doesn't handle leading zeros all by itself unless you do the leading "1" (as in "10000" as you suggest, of course) followed by using the "@Right[4,..." function. I don't find your way to be any less than mine in terms of being a "manipulation" (where, exactly, are "all these manipulations") and I find my way to be somewhat "cleaner" for whatever reason(s) - maybe because the value of the loop variable really is always the same as the numeric value of the "result".

- Dan

Steve Fabian · May 3, 2012

Some days are take hundreds of pictures, and none more for weeks. Yes, it is easy to have individual directories for specific days, weeks, months, etc. - my routine (yet to be published) creates directories named yyyymm\ and moves items there; individual items are named ddhhmmss.xxx (extensions are preserved), using ISO date and time (24h). Since I have no device capable of generating more than one item per second, this provides a naming convention for easy chronological sorting. I curently do not bother to label individual pictures when taken (though my camera could); the DESCRIBE command would do it perfectly.

Steve Fabian · May 3, 2012

Dan: I was suggesting that the counted forms of FOR and DO be ENHANCED to provide the optional leading zeros, not that they currently do. As to which is simpler - the use of @format, the "10000" method, or yet another: if N is the number without leading zeros, do this: set n=%@right[4,0000%n] - there is probably negligible difference in execution time, but one would need to measure it - if performance were significant.

Frank · May 3, 2012

I have a BTM for copying pictures to my harddisk, too.
It is some years old, erratically modified. I like to share it with you. If someone needs a translation, let me know. Today I have no time for it.
Be careful! Try it at your own risk! It is just a code-sample which is working at MY pc (and I use it often without problems).

Code:

@echo  off
setlocal
on  error pause
on  break cancel
set ziel=D:\!digipix
iff not isdir %ziel then
    echo    Ziel "%ziel" ist nicht vorhanden
    cancel
else
    if not isdir "%ziel\%_year" md /s "%ziel\%_year\%_year%1231"
endiff
set stop=n
set wild=*.jpg;*.avi;*.nef;*.wav
rem do  dir in /h /a:d c:\!usb-devices\*.*
do  drive in /L f g h i j k l m n o p q r s t u v w x y z
    iff [EMAIL]%@ready[%drive[/EMAIL]:] eq 1 .and. [EMAIL]%@removable[%drive[/EMAIL]:] eq 1 then
        do  dir in /h /a:d %drive:\*.*
            set [EMAIL]dcimdir=%@quote[%dir[/EMAIL]]
            rem eset    dcimdir
            iff isdir  %dcimdir    then
                gosub  move %dcimdir
            else
                echo    keine Dateien unter %dcimdir gefunden.
            endiff
        enddo
    endiff
enddo
MSGBOX /T10 OK "%_batchname" "Alle Dateien verarbeitet"
endlocal
quit
:move [quelle]
if  not isdir %quelle (
    echo %quelle nicht gefunden
    MSGBOX /T2 OK "Fehler" %quelle nicht gefunden!
    return
    )
 
echo    [EMAIL]%@files[/EMAIL][/s %quelle\%wild] Dateien in %quelle:
rem do  pic in /d"%quelle\" /s %wild
for /r %quelle %x in (%wild)  gosub move2 [EMAIL]%@quote[%x[/EMAIL]]
return
:move2 [pic]
    set [EMAIL]my_n=%@name[%pic[/EMAIL]]
    set [EMAIL]my_d=%@filedate[%pic,,4[/EMAIL]]
    set [EMAIL]my_t=%@filetime[%pic,,s[/EMAIL]]
    set [EMAIL]my_s=%@filesize[%pic,b[/EMAIL]]
    set [EMAIL]my_x=%@ext[%pic[/EMAIL]]
    set [EMAIL]my_j=%@year[%my_d[/EMAIL]]
    set [EMAIL]my_d=%@replace[-,,%my_d[/EMAIL]]
    set [EMAIL]my_t=%@replace[:,,%my_t[/EMAIL]]
    set my_z=%ziel\%my_j\%my_d
    if  not isdir %my_z mkdir /s %my_z
    set my_name=%my_z\%my_d-%my_t-(%my_n)
    iff exist  %my_name.%my_x then
        set stop=y
        echos  %my_name.%my_x bereits vorhanden,
        iff    %my_s eq [EMAIL]%@filesize[%my_name.%my_x,b[/EMAIL]] then
            echo Groesse ist gleich
            del /pq %pic
        else
            echo Groesse ist aber unterschiedlich
            set [EMAIL]my_nfiles=%@files[%my_name[/EMAIL]*.*]
            set [EMAIL]my_nfiles=%@inc[%my_nfiles[/EMAIL]]
            if  exist %my_name%-%my_nfiles%.%my_x (
                echo achtung: fehler!
                echo [EMAIL]%my_name%-%my_nfiles%.%@ext[%pic[/EMAIL]] duerfte nicht vorhanden sein
                pause
                quit
                )
            move    /q %pic %my_name%-%my_nfiles%.%my_x
            rem attrib  /q +r %my_name%-%my_nfiles%.%my_x
        endiff
    else
        echo    %my_name.%my_x wird verarbeitet
        move    /q %pic %my_name.%my_x
        rem attrib  /q +r %my_name.%my_x
 
return

edit:
Does someone know where this "[ EMAIL ]" stuff is coming from?

mathewsdw · May 3, 2012

Steve, I don't really disagree with the "leading zeroes" being a possible "enhancement" re the "For" loop; although I would not call it exactly a real high priority item because it is easily worked around. (And I will note that there is a small "philosophical" issue here - I think that I would consider a number that has leading zeroes to be a character string that has a numeric value rather than just a number.) As far as the "10000" goes, I have a rather deep (possibly unjustified, I'll admit) bias against the value of a variable not being what the value of the variable "really" is (although I have used exactly that technique in the past for other reasons/languages), and, again, as I indicated, I don't see any real advantage of "@Right" over "@Format", which gets the job done in a completely straightforward, don't even have to think about it (at least if you're familiar with the "@Format" function, which I am because I use it a lot) fashion and "not having to think about it" is a great positive, at least in my book! :))

I will also note that 99% of the time I probably just use the result of "@Format" directly rather than setting a variable to its value, although nothing I've said previously is changed in any way if you do set a variable to its value. (I will note, and this is not my opinion, that "unneeded" variables are considered to be a "negative" re program maintenance, and if the value of a variable is only used once it's almost, by definition, an "unneeded" variable. The (single!) possible exception to this is when the otherwise unneeded variable is there to significantly simplify a complex expression, a very rare event, at least for me.

Steve Fabian · May 3, 2012

Dan:
On your philosophical issue, yes, padding always applies to strings, and in the this thread, strings of numeric value. Relating to the benefits, you apparently do NOT use files or directories which are numbered, else you would see the benefit of automatically creating zero-padded fixed-width numbers. Those of us who do, esp. files or directories which are imported and are thus named already, need to contend with such issues. The /O: option of FOR and DO, available since V11, could partially alleviate the need, but not always. And BTW, for several widths I commonly use, I have UDFs like @f5 and @fz5, which provide leading spaces or zeros, resp., to make the item 5 characters. I even have some for monetary values that make sure that $1,234.10 is not displayed as $1,234.1, that they are the right width for the table in which they appear, etc. If you need to cycle through IMG_0012 through IMG_0215, the issue of an "extra variable" is necessary, because the overall list is neither numeric nor alphabetic - not consecutive entries in any character encoding scheme I ever came across. Please do not denigrate other users' needs because in your closed environment those needs never occur.

mathewsdw · May 3, 2012

Steve, I have only one thing to say to that; I was not (at least intending to) "denigrate" anything that you said; I was just pointing out that "creating fixed-width numbers with leading padding of "zeroes" (and absolutely nothing more than that) is very easy to achieve with the "@Format" function (it's almost the definition of the very purpose of the "@Format" function and only slightly less "automatic" - so much so that I don't really consider it to be a problem that needs to be "fixed"), and something I prefer to using a value greater than the number of digits that will be extracted and then using the "@Right" function - the complete basis of literally everything I've said on the subject) because it is more direct); and nothing you say in the above changes my opinion on that in any way. (In fact, it actually reinforces it; again, it's almost the very purpose of the "@Format" function.) I don't take suggesting that there might be an "easier" way to be a "criticism"; sometimes I totally agree with suggestions and start doing things that way; sometimes not.

- Dan

mathewsdw · May 4, 2012

After thinking it about it a bit, I decided to clear up any possible “mysteries” as to “where I’m coming from” and “what my biases are” by (re-stating some things to some degree on this bulletin board I’m rather sure; bad memory as always; although I'm also quite sure I haven’t put a large part of this on the bulletin board before and I will make a sincere attempt to figure out a way to not do this again in the future). This is kind of long; but storage is cheap and you don’t have to read any more of it than you are interested in; although you might want to skip down to the last section to get a summary of all of the previous sections because the previous sections are nothing but a detailed explanation as to exactly why the things in the summary are all there.

So first off, my “history”. When I was in college I discovered that I was very good at anything to do with computers; not just programming, literally anything. (When I discovered this I was studying Electrical Engineering specializing in “digital circuit design”) (Take note of what immediately follows the next sentences before you conclude that I am just bragging.) And in my entire career I never met anybody who was as good as I was much less better. There are only two times in my entire career where anybody found a bug in my code "after" I had "released it" into "production; and in one case the program specifications had only been given to me verbally (actually quite common, believe it or not) and I had misunderstood them so the program worked perfectly in terms of doing what I thought it was supposed to do; unfortunately that wasn't what my manager wanted it to do. I'm not really sure that this should be considered a bug. I was also, it was estimated by my employers, about an order of magnitude faster than the typical programmer. Now the qualifications: First, I tend to believe that many of the people who regularly post on this bulletin board, as well as Rex himself, are as good or better than as I was. Secondly, and possibly most importantly, that was the very first thing I had ever discovered in my entire life that I was actually good at as opposed to “just competent”, "barely competent", or even worse; and I never discovered anything else in my life either before, or while I was an active programmer-type person (other than teaching computer-related courses; but I just consider that to be an "aspect" of my skill in dealing with computers as a whole; and one of the things that probably made me very good at teaching this was my Electrical Engineering background specializing in digital hardware design and therefore my very "deep" understanding computers; and I was able to impart that understanding to students even if they didn't have that Electrical Engineering background which almost none of them, of course, did) or after I was forced to retire that I was actually "good" at. That could be because, while my memory is so bad now as to be almost crippling, it was never what you would really call “good”. And for these reasons this was the very first thing in my life that I was actually proud of.)

So I was in college when I “discovered” this; and my complete programming language history from then to now is as follows: Fortran on an IBM 1130 (a so-called minicomputer; it was a 16-bit machine with 48K of memory that was probably less powerful than the later Commodore 64 was), as well as IBM 1130 Assembler; Fortran on the CDC (Control Data Corporation) 66000 (a so-called “supercomputer” that consisted of several (I no longer remember the exact number) of parallel floating-point-only processors that had 60-bit words along with 12 16-bit (as I remember) integer only “peripheral” processors to handle I/O that were actually the same piece of physical hardware that was time-sliced 12 ways); I also did a very limited amount of assembler (“assembler” was actively discouraged on that machine and you had to “run through hoops”, so to speak, to actually use it; although it really wasn’t all that sensible on a floating-point only machine in the first place and the “peripheral processors”, not too surprisingly, were totally unavailable to “average” (as opposed to “systems”) programmers using the machine.) I will note here that there were only 4 languages in general use at that time: Fortran, assembler (for the particular machine you were dealing with, of course), COBOL (which was absolutely not used at an “engineering” school), and Basic was starting to come in to vogue. So in my junior year, as I remember, the school acquired a PDP (Peripheral Data Processor; however that name was arrived at) 11/45, a 16-bit machine that had the hardware capability to “bank switch” allowing more than 64K of memory (although 64K was all that was available at any given point in time to a single program). This machine was programmed using a version of Basic; and effectively from the user perspective (and probably even from the perspective of a “systems” programmer without doing some out-of-the-ordinary stuff) the only language that the computer ran was Basic; there wasn’t even the capability to run any other language. Even the (rather primitive) operating system was an "extension" of the Basic language (I wrote the device-driver software for some graphics terminals – again, this was in the early to mid-70’s – for the machine, in Basic. And I found out somewhat accidentally after I had finished the software that the college had gotten the terminals for free in exchange for writing the device-driver software for said terminals, which I, of course, had written. I must admit I had wondered why one particular member of the faculty had taken such a “deep” interest in what I was doing; “suggesting” requirements and testing procedures when I was doing this just for the “fun” and “challenge” of it.) The college also had a PDP 11/10 that had no software of any kind pre-installed; programs were loaded from paper tape and the software needed to do that had to be manually entered by flipping switches on the front panel of the machine. The next machine I had experience with (and this was on my first “real” job as a programmer) was an IBM 370 mainframe. I exclusively wrote Assembler-language programs for this machine at the direction of my employer. At a later point (and another company) I learned PL/1; and I taught PL/1 (as well, I might add, as IBM 370 Assembler, and even COBOL) full time for a significant period of time at a fortune 500 company. I then learned C somewhere along the way (I really no longer remember where or why), and then “transitioned” fully to C++. Now I suppose I will brag a bit here; I was working part time as an instructor at a “business” school teaching C++; and somewhere along the line my employer learned that Microsoft was offering a C++ certification exam. He wanted me to take said exam (he had been “advertising” me as the “instructor who was only rated 9’s and 10’s by the students”, and he wanted to add Microsoft C++ certification to that. He was willing to totally pay for it (plus the transportation and hotel and so on) and I had no objections so I signed up for the test and took it in Atlanta, Georgia. Well, it was a two-hour test that I finished in about 45 minutes and left, and the next day I was called in Chicago by the Microsoft employee who had administered the test from wherever he was in Washington State who told me the he, and several other people who had been taking the exam at the same time because they had made comments about it as they left, had assumed that I had “given up” on the test because I had left after only 45 minutes. The reality was that I tied for the 2nd-highest score ever to that point in time; and he added that “if you had spent another ten or fifteen minutes going over your answers before you left” I would have had “the first perfect score ever to that point in time”. I will add here, with some humor, that I don’t really know how correct he was in that presumption because there has always, for me for whatever reason(s), been a somewhat inverse relationship between how difficult a question was and how likely I was to get it right.

I will also note here that it is my belief that the reason that Microsoft had introduced this exam was because they felt that C++ was a very difficult language (which it was) and that there were too many "charlatans" out there who claimed to know the language well and really didn't and claimed to represent Microsoft in some way which Microsoft absolutely did not want; so they developed and administered this test to "weed these charlatans out". However, it is also my belief that Microsoft eventually came to feel that C++ was just too difficult, the result of which being that they developed and marketed a much-simplified "implementation" of C++ that they called "C#" ("C-Sharp").

Next, the “consequences” of that “history”: Until TCC (and this is now true for the batch language for cmd.exe since the introduction of the “/A” (“arithmetic”) parameter on the “Set” statement) there was a clear-cut, hardware defined, distinction between “numeric” and “character” (string) values; and this difference was pretty much absolute. “Numeric” values came in one of quite a few possible “formats”: pure-integer values (signed or unsigned) that were either 8, 16, 32, or 64 bits long; and in the case of the IBM mainframe, BCD (“binary coded decimal”) numbers ranging from one to fifteen digits long; and floating point formats (generally binary for most machines, but actually base-16 (hexadecimal) for the IBM mainframes, believe it or not) ranging in size from 4 to 16 bytes. The binary integers could be considered to have a fixed binary (as oppose to decimal) point; but this was purely an attribute of the imagination of the programmer. For the IBM mainframe, BCD (again, binary-coded decimal) could be considered to have a fixed number of decimal (of course) places, and the assembler program provided very limited support for this (the P-prime (P’) attribute, as I remember it was called), but very few assembler programmers were even aware of existence of this data attribute much less used it. And floating point numbers could be coded as if they had a fixed number of either binary or hex or decimal places at the discretion of the assembly-language programmer; but this existed purely in the “imagination” of that programmer, there was no support for this whatsoever by either the hardware or the software (assembly-language) program. I will note here that one of the jobs I had fairly early in my career was to entirely rewrite, from scratch, the “calculation engine” for a mainframe spreadsheet (ala Exel) program in floating point. The program had originally been written to use BCD arithmetic to that point, but the software vendor (who I worked for, of course) decided to recode it using (IBM’s) floating-point format because they felt that BCD numbers took up too much storage (strictly integers or +/- 0 (yes, negative zero was a "real" value but it was mostly exactly equivalent to plus zero) contained in 8 bytes for 15 digits or 999,999,999,999,999; but there was absolutely no support for digits after the decimal point and "scaling" by multiples of 10 was left entirely up to the programmer) and was too slow. So they hired me to specifically entirely re-code the “calculation engine”, as well as all of the numeric input conversion and output formatting routines, in floating point. And they initially wanted me to carry all floating point numbers internally as their "real" values multiplied by 100, matching what they had largely done with the binary-coded decimal, so that round-off errors (such as getting .9999999 when the answer really should have been 1.00) for financial values did not occur. (The president of this company, it was a very small company that had only about a half-dozen employees including the president and vice-president of the company, hired me because he had previously been a contractor at a company where I worked and he was familiar with my work.) However, at some point (I really don’t remember the details) they gave up on that idea and I used “regular”, unscaled, floating point. And there are some aspects of that that I’m quite proud of today: Number one, even though the internal “accuracy” of floating point numbers was somewhere between 14½ and 17 decimal digits (since the floating-point format was base 16 it wasn’t a constant value for base 10) and the code I wrote only operated to a smaller number of digits (I believe the number was 12, but it was 25 years ago), this is allowed me to code "sophisticated" (if you’ll pardon me) rounding to convert the floating point to decimal and format it for output, and results like “.99999999” almost never occurred. And another thing that I am proud of is this: for reasons I no longer remember anymore they didn’t want me to use the code from the run-time libraries of either Fortran or PL/I, they wanted me to write that code, from scratch, which I did. We are talking here about log, log10, e to the x, 10 to the x, (in fact, anything to anything), as well as the trigonometric routines (sin, cos, tan, asin, acos, atan, the hyperbolic trigonometric functions) as well as all other numeric routines that the spreadsheet was capable of doing (it was too long ago for me to necessarily really remember the complete list). And, being somewhat paranoid, I suppose, they were very concerned about the speed and accuracy of my code vs. that of the high-level language run-time libraries and did extensive testing of my code, and my code was always as least as accurate if not more accurate (i.e. all tests that that you can think of like "sin(x)**2+cos(x)**2 = 1" and "e**(loge(x)) = x" and the like) and were always at least as fast if not faster than the run-time library code. (I really have no theories as to why my code at least seemed to be better than run-time library code was.) And I got all of the algorithms for my routines (this was pretty much before the very existence of the Internet) from what was called the “Mathematical Handbook”, a very thick red book that contained the formulas for all of these things as well as page after page containing tables for logarithms (both natural and base 10) as well as page after page of tables listing the values of the trigonometric functions for probably thousands of values (I haven’t seen said book for probably 20 years and I no longer remember the precision to which these calculations were done). (I'll note that the reason this existed then and no longer exits now was because this was at the "dawn" of the introduction of "scientific" calculators; what few there were were very expensive.) And finally I'll add that I developed, on my own, a square root routine that did no multiplication or division by by anything other than the number 4 (which was very fast) and straight comparisons and straight additions and/or subtractions, all very fast instructions for the floating-point hardware. And my code was as accurate as it was theoretical capable of being. (It's another, rather long, story as to where I got the idea for the algorithm; I'll just say that I didn't "invent" the "concepts" behind the algorithm, just its implementation in IBM's base-16 floating point hardware/instruction set.)

So the bottom line is this: TCC (and now batch files for cmd.exe via the “Set” command with the “/A” parameter that is also now shared by TCC) don’t really have “numeric” data types; strictly speaking, all data is "character" data. However, based on my previous history I still prefer to maintain the “illusion” of numeric values, and this “illusion” has no downside whatsoever that I can think of “off the top of my head”. But in terms of this “illusion”, 12.96 is a number, whereas “00006931” is a character string that happens to have the numeric value or 931.

And the “@Format” functions are there to convert (usually assumed to be) numeric values to character strings; whereas the statement: “Set Variable+=0” will effectively convert a character string that contains numeric value to a number. (The above “Set” statement produces an error if the value of the “Variable” can not be interpreted as a valid number; neither the “@Format” function nor “@FormatN” function “care” whether the “number” (the 2nd argument for both functions) is actually numeric. (For “@Format” it is a complete irrelevancy and would only be relevant at all if a leading “0” is specified on the “format” argument in the first place, and if that leading zero is specified those leading or trailing zeroes are placed on the result if applicable no matter what the value of the “string” argument is, numeric or otherwise; for “@FormatN” the number is “defined” by what precedes the first non-numeric character of the 2nd argument (the “value”), and if the first character of the 2nd argument is a “Q”, for instance, the value of the returned result is simply zero formatted in whatever way the first argument specifies.)

So it is my "bias" that there is a clear distinction between character strings that contain a valid numeric value and actual "numbers".

- Dan

Avi Shmidman · May 4, 2012

Thank you, Dan, for this illuminating window into your background and your position on this issue.
Nevertheless, I'm going to voice my support for Steve Fabian's suggestion regarding the possibility of using zero-padded numbers as constructs in a "do" loop.
Your distinction between strings and numbers in programming languages is clear. However, the point of TCC - at least the way I see it - is not purely to create a formal programming language. Truly, everything that one does at the command line in TCC can also be accomplished by writing a program in C++ or C#, using functions from the Windows API and from the CRT or CLR. However, the fact is that in many many cases, the effort take out the compiler and produce a relevant program is generally unjustified.
For instance, If I need to move pictures img_0045 through img_0609 to a certain directory, writing a C++ program to do so would not be the best solution; it can be more efficiently done in Windows Explorer, by selecting the files and dragging them. TCC then takes this to another level - if selecting the files in Explorer takes O(n) time (proportional to the number of files in the selection), TCC can often do this in O(1), and with a precision that also circumvents the error prone nature of selection+drag.
However, the use of TCC to do this depends on TCC's ability to perform such precise actions with easy-to-remember, non-convoluted, easy-to-type commands. Once it becomes necessary to start using extra functions such as %Format (which require using a shift key to type, for one thing), the quick and easy-to-type nature of the "do" loop fades somewhat.
In sum, it seems to me that, unlike formal programming languages, TCC would be best driven by the common use cases of its user base, even if the direct implementation of those use cases will sometimes cross a theoretical line such as the character/number dichotomy.
If you'd like, Dan, you can think of this as a C++ class, neither character nor number, but rather "class ZeroPaddedNumber". The constructor of the class takes two numbers: the initial value and the number of zeros to pad with. An instance of the class can be incremented or decremented with ++ and --, which affect the member variable "unsigned long dwIndex". But when used in an expression, the class will always return a string containing a zero-padded printout of the number.

mathewsdw · May 4, 2012

And thank you for the time you took to read actually read my missive and respond to it, Avi. And your comments re TCC vs other "high-level" languages are so relevant that they are the reason that I use TCC's batch language whenever possible (and my terribly bad memory is still another real incentive); as you say there's often far too much effort (i.e., "overhead") to get what you want done done in a language like C++; whereas for TCC that's not very often the case. (In general, of that long list of languages that I know well, TCC's batch file language is by far the most generally powerful. The only thing I can think of that TCC's batch file language lacks re other high-level languages is formal data structures (the binary functions are close but no cigar, in my opinion), but I really doubt that anybody would write a program in TCC where such data structures are necessary as that would be somewhat silly in my opinion; for these kinds of things "traditional" high-level languages like COBOL, PL/1, and C++, for example, would be far more suitable; and it is also the case, in my opinion, that programs that need data structures probably don't really need pretty much any of TCC's other very-powerful features; they are really just not applicable.) But I think my main point might still be getting just a little bit "lost" here: While I still don't really consider the option of having zero-padded numbers generated by the "For" statement to be a bad idea, I (unlike you, it would appear) don't consider the "cost" of significantly enhancing the "For" statement to be really all that worthwhile vs. just using the "@Format" function(s) which are there and exist specifically for that purpose; and I dislike the whole "+10000" thing on general principles alone (and I really don't think it's easier to do or less typing than just using the "@Format" function(s)); in my opinion the value of a number should be what the value of the number actually is. But we can certainly disagree on these issue(s); I don't consider you to be stupid, for instance, if you don't happen to share my opinion(s)! :)

- Dan

Steve Fabian · May 4, 2012

Dan: Just one point. In the instances the OP and I are interested, we are dealing with strings containing numeric substrings, not numbers. Numbers are just a tool to generate (or more properly, enumerate) the strings, hence your philosophical issues are misplaced. Even if we were dealing with pure numbers, a computer does not deal with numbers - only with its representations. Would a two's complement instead of sgin-and-magnitude representation offend you? How about BCD?

David Marcus · May 4, 2012

mathewsdw said:
So the bottom line is this: TCC [...] don’t really have “numeric” data types; strictly speaking, all data is "character" data.

The values are stored internally as decimal strings, but TCC will sometimes treat the values as numbers and sometimes as strings. How they are treated is usually more important than how they are stored.

David Marcus · May 4, 2012

Steve Fabian said:
Even if we were dealing with pure numbers, a computer does not deal with numbers - only with its representations.

I suppose that depends on what you mean by "deal with". I once tried to explain to someone in a programming forum that floating point numbers are a subset of the real numbers (with modified + and * operations that are the result of applying the usual real number versions, then rounding to the nearest floating point number). The other person kept insisting that floating point numbers were strings of digits, thus confusing how they are represented internally in a computer with what their mathematical structure is.

mathewsdw · May 4, 2012

Steve, I'm a little bit confused here because, as it turns out, I absolutely did understand the original request and (as far as I know given my bad memory, and I'm sure enough about this not to spend a lot of time given my semi-blindness going over the previous postings in this thread to verify it) all later postings in this thread including yours. Maybe the issue is that my "suggested solution" was given without enough context to make if fully understandable and therefore realizable. So, to correct that, here is a batch file that actually does a (very close!) superset of what was originally asked:

Code:

@Echo Off
SetLocal
For /L I in (78,1,87) Do (
  Set Name=Q:\A_Directory\IMG%@Format[04,%I]Whatever.ext
  Iff NOT EXIST "%NAME" Then
      @Echo This is whatever you would do if "%Name" actually existed...
  EndIff 
)
EndLocal
Quit 0

And here is the result of actually running said batch file:

Code:

Whatever you would do if "Q:\A_Directory\IMG0078Whatever.ext" actually existed...
Whatever you would do if "Q:\A_Directory\IMG0079Whatever.ext" actually existed...
Whatever you would do if "Q:\A_Directory\IMG0080Whatever.ext" actually existed...
Whatever you would do if "Q:\A_Directory\IMG0081Whatever.ext" actually existed...
Whatever you would do if "Q:\A_Directory\IMG0082Whatever.ext" actually existed...
Whatever you would do if "Q:\A_Directory\IMG0083Whatever.ext" actually existed...
Whatever you would do if "Q:\A_Directory\IMG0084Whatever.ext" actually existed...
Whatever you would do if "Q:\A_Directory\IMG0085Whatever.ext" actually existed...
Whatever you would do if "Q:\A_Directory\IMG0086Whatever.ext" actually existed...
Whatever you would do if "Q:\A_Directory\IMG0087Whatever.ext" actually existed...

Now in my code I, of course, used "NOT EXIST" rather than "EXIST" because these files do not, of course, exist on my machine; and I used a much longer and more complex file name than what was originally asked for to fully illustrate the possibilities.

So, hopefully (?) this is a complete and total (and final!) explanation of what I was suggesting and proof that it does, in fact, fully do the job. (And I will add, one more time, that this is the very purpose of the "@Format" function.)

- Dan

mathewsdw · May 4, 2012

And thank you, Dave (you made your posting while I was entering mine), you hit the nail (I suppose I was trying to hit) on the head!!! - Dan

Search

Welcome!

How to? Filter a list by numeric number within filename

Avi Shmidman

Steve Fabian

Avi Shmidman

samintz

Scott Mintz

Avi Shmidman

Charles Dye

Super Moderator

Avi Shmidman

Charles Dye

Super Moderator

Steve Fabian

mathewsdw

Avi Shmidman

Steve Fabian

Steve Fabian

mfarah

mathewsdw

Steve Fabian

Steve Fabian

Frank

mathewsdw

Steve Fabian

mathewsdw

mathewsdw

Avi Shmidman

mathewsdw

Steve Fabian

David Marcus

David Marcus

mathewsdw

mathewsdw

Similar threads