@XREPLACE

#1
I want to strip all characters except decimal digits from a string.
Combining "uhelp @xreplace" with TCC help topic
"regularexpressionsyntax.htm" this should do it:
echo %@xreplace[\D,,string]
However, this command:
echo %a %+ echo %@xreplace[\D,,%a]
reports:

2009-07-18,21:19:20.000
212009-07-18192009-07-18202009-07-18000

Note: my regular expression syntax is set to "perl".

BTW, the real purpose is to report a compact but intelligible file
timestamp. An additional date format in Charles Dye's iso8601.dll would do
the trick much more neatly. @DATECONV ought to have explicit format
specification for both input and output format, esp. to disambiguate input
dates like 12/05/10 - it could be German style (May 12th) or US style (Dec.
5th).
--
Steve
 
#2
@FILTER[chars,string] : Removes any characters in "string" that aren't in
"chars". For example, to remove all non-numeric characters from a variable:



%@filter[0123456789,%var]

Topic "f_filter.htm" last edited 2008-10-21.

On Thu, Jul 29, 2010 at 10:31 AM, Steve Fábián <>wrote:


> I want to strip all characters except decimal digits from a string.
> Combining "uhelp @xreplace" with TCC help topic
> "regularexpressionsyntax.htm" this should do it:
> echo %@xreplace[\D,,string]
> However, this command:
> echo %a %+ echo %@xreplace[\D,,%a]
> reports:
>
> 2009-07-18,21:19:20.000
> 212009-07-18192009-07-18202009-07-18000
>
> Note: my regular expression syntax is set to "perl".
>
> BTW, the real purpose is to report a compact but intelligible file
> timestamp. An additional date format in Charles Dye's iso8601.dll would do
> the trick much more neatly. @DATECONV ought to have explicit format
> specification for both input and output format, esp. to disambiguate input
> dates like 12/05/10 - it could be German style (May 12th) or US style (Dec.
> 5th).
> --
> Steve
>
>
>
>
>
>
>
>


--
Jim Cook
2010 Sundays: 4/4, 6/6, 8/8, 10/10, 12/12 and 5/9, 9/5, 7/11, 11/7.
Next year they're Monday.
 
#3
On Thu, 29 Jul 2010 13:31:43 -0400, you wrote:

|I want to strip all characters except decimal digits from a string.
|Combining "uhelp @xreplace" with TCC help topic
|"regularexpressionsyntax.htm" this should do it:
| echo %@xreplace[\D,,string]
|However, this command:
| echo %a %+ echo %@xreplace[\D,,%a]
|reports:
|
|2009-07-18,21:19:20.000
|212009-07-18192009-07-18202009-07-18000

I'm using TCC's NthArgument() function to pick out the args. When you
say

Code:
v:\> echo %@xreplace[\D,,2009-07-18,21:19:20.000]
212009-07-18192009-07-18202009-07-18000
it's giving

arg0 = \D
arg1 = 2009-07-18
arg2 = 21:19:20.000

skipping consecutive separators (commas).

Use instead:

Code:
v:\> echo %@xreplace[\D,"",2009-07-18,21:19:20.000]
20090718211920000
 
#4
| I'm using TCC's NthArgument() function to pick out the args. When
| you say
|
|
| Code:
| ---------
| v:\> echo %@xreplace[\D,,2009-07-18,21:19:20.000]
| 212009-07-18192009-07-18202009-07-18000
| ---------
| it's giving
|
| arg0 = \D
| arg1 = 2009-07-18
| arg2 = 21:19:20.000
|
| skipping consecutive separators (commas).

Ah! That explains the result. However, I thought in built-in functions you
can skip a parameter by consecutive commas.

| Use instead:
|
| Code:
| ---------
| v:\> echo %@xreplace[\D,"",2009-07-18,21:19:20.000]
| 20090718211920000
| ---------

Thanks, I think Jim's suggestion for the specific case may be faster, esp.
in a loop processing many files.
--
Steve
 
#5
On Thu, 29 Jul 2010 14:53:18 -0400, you wrote:

|| I'm using TCC's NthArgument() function to pick out the args. When
|| you say
||
||
|| Code:
|| ---------
|| v:\> echo %@xreplace[\D,,2009-07-18,21:19:20.000]
|| 212009-07-18192009-07-18202009-07-18000
|| ---------
|| it's giving
||
|| arg0 = \D
|| arg1 = 2009-07-18
|| arg2 = 21:19:20.000
||
|| skipping consecutive separators (commas).
|
|Ah! That explains the result. However, I thought in built-in functions you
|can skip a parameter by consecutive commas.

I don't know if that's a hard-fast rule for the built-in ones (it is
often, perhaps always, the case) but for mine, it depends on how I
parse the args.
 

Charles Dye

Super Moderator
Staff member
May 20, 2008
3,617
46
Albuquerque, NM
prospero.unm.edu
#6
BTW, the real purpose is to report a compact but intelligible file timestamp. An additional date format in Charles Dye's iso8601.dll would do the trick much more neatly.
It'd be easy enough to support 8-digit YYYYMMDD type dates. I really, really don't want to deal with more than eight digits, though; no huge integers combining both date and time info....
 
#7
| ---Quote (Originally by Steve Fbin)---
|| BTW, the real purpose is to report a compact but intelligible file
|| timestamp. An additional date format in Charles Dye's iso8601.dll
|| would do the trick much more neatly.
| ---End Quote---
| It'd be easy enough to support 8-digit YYYYMMDD type dates. I
| really, really don't want to deal with more than eight digits,
| though; no huge integers combining both date and time info....

They are not really integers, they are strings composed of decimal
digits. The model I have in mind is the same as the built-in _datetime and
_utcdatetime. They are a convenient format for reporting file ages, more
readable yet shorter than @fileage[]. I have no practical use for
resolutions below 1s, though I can conceive their benefit in special
circumstances.
Using the @FILTER approach probably provides as fast a method of
compressing as using a new option in your @DATECONV, but only if the default
date and time formats are ISO-like, i.e., field order and width match, with
24-hour times. My default (Windows time and short date) format is like that,
I just use period . as field separator in both date and time. For half-day
TOD and for any date format not in hierarchical order there is more
complexity. Between formats 1 and 2 ambiguity arises for the first 12 days
of each month, which is why I suggested that @DATECONV requires both an
input and output format specifier to be generic.
BTW, do we have any TOD converters between half-day times (AM/PM) and
full day (24h) times?
--
Steve
 

Charles Dye

Super Moderator
Staff member
May 20, 2008
3,617
46
Albuquerque, NM
prospero.unm.edu
#8
The model I have in mind is the same as the built-in _datetime and _utcdatetime. They are a convenient format for reporting file ages, more readable yet shorter than @fileage[]. I have no practical use for resolutions below 1s, though I can conceive their benefit in special circumstances.
Okay, I've added eight-digit YYYYMMDD as a legal date format, and a kludge for fourteen-digit YYYYMMDDhhmmss output in %@filestamp.

Between formats 1 and 2 ambiguity arises for the first 12 days
of each month, which is why I suggested that @DATECONV requires both an input and output format specifier to be generic.
I've had this for some time now.

BTW, do we have any TOD converters between half-day times (AM/PM) and full day (24h) times?
I don't, though it should be trivial to do.
 
#11
| ---Quote (Originally by Steve Fbin)---
|| Do you have an FTP site whence I can always download the latest of
|| your plugin?
| ---End Quote---
| It's on UNM's web server. I don't think you can access it via
| anonymous FTP, only HTTP.
|
| http://www.unm.edu/~cdye/plugins/

That's really too bad. In fact it is shortsighted of the system
managers, resulting in the use of more server resources for less user
benefit.
My major issue with HTTP only is right is plainly present in comparing
what the acronyms HTTP and FTP stand for:
HTTP = HyperText Transfer Protocol
FTP = File Transfer Protocol
HTTP is designed for text, FTP for files. With FTP I can limit my downloads
to that which is newer than what I already have. HTTP requires downloading
all, and after having wasted communication resources throwing away that
which was old and already downloaded previously. Furthermore, with HTTP one
can download only files known to exist, one at a time, using a browser,
since TCC does not support copying files from an HTTP server.
--
Steve
 
May 29, 2008
533
3
Groton, CT
#12
| ---Quote (Originally by Steve Fbin)---
|| Do you have an FTP site whence I can always download the latest of
|| your plugin?
| ---End Quote---
| It's on UNM's web server. I don't think you can access it via
| anonymous FTP, only HTTP.
|
| http://www.unm.edu/~cdye/plugins/

That's really too bad. In fact it is shortsighted of the system
managers, resulting in the use of more server resources for less user
benefit.

...

Furthermore, with HTTP one
can download only files known to exist, one at a time, using a browser,
since TCC does not support copying files from an HTTP server.
--
Steve
TCC does support copying via HTTP:
Code:
C:\work> copy http://www.unm.edu/~cdye/dl/iso8601.zip \temp
http://www.unm.edu/~cdye/dl/iso8601.zip => C:\temp\iso8601.zip
     1 file copied

C:\work> dir C:\temp\iso8601.zip

 Volume in drive C is IRVING-C       Serial number is 28aa:b2d9
 Directory of  C:\temp\iso8601.zip

07-31-2010  10:04         175,064  iso8601.zip
           175,064 bytes in 1 file and 0 dirs    176,128 bytes allocated
    12,037,980,160 bytes free

C:\work> ver

TCC  11.00.51   Windows XP [Version 5.1.2600]
 
#13
| TCC does support copying via HTTP:
| Code:
| ---------
| C:\work> copy http://www.unm.edu/~cdye/dl/iso8601.zip \temp
| http://www.unm.edu/~cdye/dl/iso8601.zip => C:\temp\iso8601.zip
| 1 file copied

Thanks, Dave. Obviously, I was wrong about TCC, but not about the HTTP
vs. FTP comparison:

1/ Since TCC's DIR does not work for HTTP servers, the incorrect URL cannot
be corrected by search (Dave, how did you know the correct URL?) - I can
searchor public directories of an FTP site for the desired file

2/ For the same reason I cannot download ALL available files, only those I
am aware of

3/ For the same reason I cannot check whether or not the server has a
different version of a file I know of than my copy without downloading and
comparing

4/ The copy of an HTTP file has the timestamp of the copying, which is like
dating Hamlet or the Declaration of Independence at the time you acquired
your own copy. A true renaissace!

5/ Once I downloaded a different version of a file, I don't know whether it
is an older or newer version, unless the content allows me to determine
that. Only archive files and those files containing explicit version
information, e.g., retrievable by TCC's @VERINFO function, allow the
determination to be automated, but never trivial.

Note: Issue 5/, older or newer version, is a serious problem when
downloading files using HTTP from different "mirror" sites, which may not be
properly updated. Programs like the updater.exe included with JPsoft
products use a separate data file on the server, queried as part of the
update process, to overcome this problem. It would be way too much burden on
TCC plugin developers to attemptto do that.
--
Steve
 

Charles Dye

Super Moderator
Staff member
May 20, 2008
3,617
46
Albuquerque, NM
prospero.unm.edu
#14
That's really too bad. In fact it is shortsighted of the system
managers, resulting in the use of more server resources for less user
benefit.
I think they go out of their way to make it difficult to scrape their web servers because they really don't want people scraping the servers. I've toyed with the idea of building my own FTP/web server -- it would doubtless be a great learning experience -- but that ain't gonna happen any time soon.

If you really wanted to automate the process, you could just download the index.html in that directory and compare it with a cached copy. If they don't match, you could either (a) copy all the .ZIP files referenced in index.html (the brute-force-and-ignorance approach); or (b) parse the table to detect changed version numbers, and download only the relevant .ZIP files.
 
#15
| I think they go out of their way to make it difficult to scrape
| their web servers because they really don't want people scraping the
| servers.

I stand by my original statement, at least for the public access web
servers, which are intended by the University to be available to the WWW.

| I've toyed with the idea of building my own FTP/web server
| -- it would doubtless be a great learning experience -- but that
| ain't gonna happen any time soon.

I had software several years ago (I might still be able to find it)
which allowed me to set up a directory on my computer as an FTP root. In
fact I had set up two - one for uploads, one for downloads. Normal FTP could
access it from anywhere.

| If you really wanted to automate the process, you could just
| download the index.html in that directory and compare it with a
| cached copy. If they don't match, you could either (a) copy all the
| .ZIP files referenced in index.html (the brute-force-and-ignorance
| approach); or (b) parse the table to detect changed version numbers,
| and download only the relevant .ZIP files.

Parsing the unqiue formats of each provider requires separate parsing
package for each. However, remember the old 4DOS batch file Mike Bessy
published to build a very simple "index.html", enumerating filenames, sizes,
timestamps and CRC-s for each available file? That would be a great scheme!
You could rename "index.html" to "details.html", and include it with the
list of ZIP files.
BTW, I redate all downloaded archives to the filetime of the latest file
it contains.
My normal procedure to download from any one source (your server,
Vince's LUCKY, ftp://jpsoft.com, etc.) is to create a new subdirectory in
the appropriate main directory, named by the date yyyymmdd.000 (or .001 for
2nd download the same day, etc.), and copy only files newer than the last
download directory contains. For ftp servers this is just one TCC COPY
command. Using a simple INDEX file as I mentioned above would be just a tiny
bit more complicated, but would not require a separate parser for each
source location.
--
Steve
 
May 29, 2008
533
3
Groton, CT
#16
| TCC does support copying via HTTP:
| Code:
| ---------
| C:\work> copy http://www.unm.edu/~cdye/dl/iso8601.zip \temp
| http://www.unm.edu/~cdye/dl/iso8601.zip => C:\temp\iso8601.zip
| 1 file copied

Thanks, Dave. Obviously, I was wrong about TCC, but not about the HTTP
vs. FTP comparison:
Right.
1/ Since TCC's DIR does not work for HTTP servers, the incorrect URL cannot
be corrected by search (Dave, how did you know the correct URL?) - I can
searchor public directories of an FTP site for the desired file
From having downloaded it before. You're right that there's no way with HTML to know what other files are in the remote source directory.

2/ For the same reason I cannot download ALL available files, only those I
am aware of

3/ For the same reason I cannot check whether or not the server has a
different version of a file I know of than my copy without downloading and
comparing

4/ The copy of an HTTP file has the timestamp of the copying, which is like
dating Hamlet or the Declaration of Independence at the time you acquired
your own copy. A true renaissace!

5/ Once I downloaded a different version of a file, I don't know whether it
is an older or newer version, unless the content allows me to determine
that. Only archive files and those files containing explicit version
information, e.g., retrievable by TCC's @VERINFO function, allow the
determination to be automated, but never trivial.

Note: Issue 5/, older or newer version, is a serious problem when
downloading files using HTTP from different "mirror" sites, which may not be
properly updated.
I agree with all that. All I was doing was pointing out that TCC can transfer a file using HTTP. Unfortunately, it's one file at a time. There are tools out there (for Mozilla) that will make a list of all download links on a page and then download all those files. Sorry I don't have a direct reference right now -- look at Addons for Mozilla.
 
#17
On Sat, 31 Jul 2010 15:09:01 -0400, dcantor <> was
claimed to have wrote:


>I agree with all that. All I was doing was pointing out that TCC
>can transfer a file using HTTP. Unfortunately, it's one file at
>a time. There are tools out there (for Mozilla) that will make
>a list of all download links on a page and then download all those
>files. Sorry I don't have a direct reference right now -- look at
>Addons for Mozilla.
For Mozilla based browsers, DownThemAll! is likely your best bet. It's
not entirely automated, but it can be configured to download all links
on a page (or all matching a regex, so you can avoid back and page
resort links)

From the command line, a win32 port of wget can recursively retrieve
just about anything you need without re-downloading existing content if
so configured.