1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

@XREPLACE

Discussion in 'Plugins' started by Steve Fabian, Jul 29, 2010.

  1. Steve Fabian

    Joined:
    May 20, 2008
    Messages:
    3,520
    Likes Received:
    4
    I want to strip all characters except decimal digits from a string.
    Combining "uhelp @xreplace" with TCC help topic
    "regularexpressionsyntax.htm" this should do it:
    echo %@xreplace[\D,,string]
    However, this command:
    echo %a %+ echo %@xreplace[\D,,%a]
    reports:

    2009-07-18,21:19:20.000
    212009-07-18192009-07-18202009-07-18000

    Note: my regular expression syntax is set to "perl".

    BTW, the real purpose is to report a compact but intelligible file
    timestamp. An additional date format in Charles Dye's iso8601.dll would do
    the trick much more neatly. @DATECONV ought to have explicit format
    specification for both input and output format, esp. to disambiguate input
    dates like 12/05/10 - it could be German style (May 12th) or US style (Dec.
    5th).
    --
    Steve
     
  2. Jim Cook

    Joined:
    May 20, 2008
    Messages:
    604
    Likes Received:
    0
    @FILTER[chars,string] : Removes any characters in "string" that aren't in
    "chars". For example, to remove all non-numeric characters from a variable:



    %@filter[0123456789,%var]

    Topic "f_filter.htm" last edited 2008-10-21.

    On Thu, Jul 29, 2010 at 10:31 AM, Steve Fábián <>wrote:




    --
    Jim Cook
    2010 Sundays: 4/4, 6/6, 8/8, 10/10, 12/12 and 5/9, 9/5, 7/11, 11/7.
    Next year they're Monday.
     
  3. vefatica

    Joined:
    May 20, 2008
    Messages:
    8,063
    Likes Received:
    30
    On Thu, 29 Jul 2010 13:31:43 -0400, you wrote:

    |I want to strip all characters except decimal digits from a string.
    |Combining "uhelp @xreplace" with TCC help topic
    |"regularexpressionsyntax.htm" this should do it:
    | echo %@xreplace[\D,,string]
    |However, this command:
    | echo %a %+ echo %@xreplace[\D,,%a]
    |reports:
    |
    |2009-07-18,21:19:20.000
    |212009-07-18192009-07-18202009-07-18000

    I'm using TCC's NthArgument() function to pick out the args. When you
    say

    Code:
    v:\> echo %@xreplace[\D,,2009-07-18,21:19:20.000]
    212009-07-18192009-07-18202009-07-18000
    it's giving

    arg0 = \D
    arg1 = 2009-07-18
    arg2 = 21:19:20.000

    skipping consecutive separators (commas).

    Use instead:

    Code:
    v:\> echo %@xreplace[\D,"",2009-07-18,21:19:20.000]
    20090718211920000
     
  4. Steve Fabian

    Joined:
    May 20, 2008
    Messages:
    3,520
    Likes Received:
    4
    | I'm using TCC's NthArgument() function to pick out the args. When
    | you say
    |
    |
    | Code:
    | ---------
    | v:\> echo %@xreplace[\D,,2009-07-18,21:19:20.000]
    | 212009-07-18192009-07-18202009-07-18000
    | ---------
    | it's giving
    |
    | arg0 = \D
    | arg1 = 2009-07-18
    | arg2 = 21:19:20.000
    |
    | skipping consecutive separators (commas).

    Ah! That explains the result. However, I thought in built-in functions you
    can skip a parameter by consecutive commas.

    | Use instead:
    |
    | Code:
    | ---------
    | v:\> echo %@xreplace[\D,"",2009-07-18,21:19:20.000]
    | 20090718211920000
    | ---------

    Thanks, I think Jim's suggestion for the specific case may be faster, esp.
    in a loop processing many files.
    --
    Steve
     
  5. vefatica

    Joined:
    May 20, 2008
    Messages:
    8,063
    Likes Received:
    30
    On Thu, 29 Jul 2010 14:53:18 -0400, you wrote:

    || I'm using TCC's NthArgument() function to pick out the args. When
    || you say
    ||
    ||
    || Code:
    || ---------
    || v:\> echo %@xreplace[\D,,2009-07-18,21:19:20.000]
    || 212009-07-18192009-07-18202009-07-18000
    || ---------
    || it's giving
    ||
    || arg0 = \D
    || arg1 = 2009-07-18
    || arg2 = 21:19:20.000
    ||
    || skipping consecutive separators (commas).
    |
    |Ah! That explains the result. However, I thought in built-in functions you
    |can skip a parameter by consecutive commas.

    I don't know if that's a hard-fast rule for the built-in ones (it is
    often, perhaps always, the case) but for mine, it depends on how I
    parse the args.
     
  6. Charles Dye

    Charles Dye Super Moderator
    Staff Member

    Joined:
    May 20, 2008
    Messages:
    3,354
    Likes Received:
    39
    It'd be easy enough to support 8-digit YYYYMMDD type dates. I really, really don't want to deal with more than eight digits, though; no huge integers combining both date and time info....
     
  7. Steve Fabian

    Joined:
    May 20, 2008
    Messages:
    3,520
    Likes Received:
    4
    | ---Quote (Originally by Steve Fbin)---
    || BTW, the real purpose is to report a compact but intelligible file
    || timestamp. An additional date format in Charles Dye's iso8601.dll
    || would do the trick much more neatly.
    | ---End Quote---
    | It'd be easy enough to support 8-digit YYYYMMDD type dates. I
    | really, really don't want to deal with more than eight digits,
    | though; no huge integers combining both date and time info....

    They are not really integers, they are strings composed of decimal
    digits. The model I have in mind is the same as the built-in _datetime and
    _utcdatetime. They are a convenient format for reporting file ages, more
    readable yet shorter than @fileage[]. I have no practical use for
    resolutions below 1s, though I can conceive their benefit in special
    circumstances.
    Using the @FILTER approach probably provides as fast a method of
    compressing as using a new option in your @DATECONV, but only if the default
    date and time formats are ISO-like, i.e., field order and width match, with
    24-hour times. My default (Windows time and short date) format is like that,
    I just use period . as field separator in both date and time. For half-day
    TOD and for any date format not in hierarchical order there is more
    complexity. Between formats 1 and 2 ambiguity arises for the first 12 days
    of each month, which is why I suggested that @DATECONV requires both an
    input and output format specifier to be generic.
    BTW, do we have any TOD converters between half-day times (AM/PM) and
    full day (24h) times?
    --
    Steve
     
  8. Charles Dye

    Charles Dye Super Moderator
    Staff Member

    Joined:
    May 20, 2008
    Messages:
    3,354
    Likes Received:
    39
    Okay, I've added eight-digit YYYYMMDD as a legal date format, and a kludge for fourteen-digit YYYYMMDDhhmmss output in %@filestamp.

    I've had this for some time now.

    I don't, though it should be trivial to do.
     
  9. Steve Fabian

    Joined:
    May 20, 2008
    Messages:
    3,520
    Likes Received:
    4
    Charles:
    Do you have an FTP site whence I can always download the latest of your
    plugin?
    --
    Steve
     
  10. Charles Dye

    Charles Dye Super Moderator
    Staff Member

    Joined:
    May 20, 2008
    Messages:
    3,354
    Likes Received:
    39
    It's on UNM's web server. I don't think you can access it via anonymous FTP, only HTTP.

    http://www.unm.edu/~cdye/plugins/
     
  11. Steve Fabian

    Joined:
    May 20, 2008
    Messages:
    3,520
    Likes Received:
    4
    | ---Quote (Originally by Steve Fbin)---
    || Do you have an FTP site whence I can always download the latest of
    || your plugin?
    | ---End Quote---
    | It's on UNM's web server. I don't think you can access it via
    | anonymous FTP, only HTTP.
    |
    | http://www.unm.edu/~cdye/plugins/

    That's really too bad. In fact it is shortsighted of the system
    managers, resulting in the use of more server resources for less user
    benefit.
    My major issue with HTTP only is right is plainly present in comparing
    what the acronyms HTTP and FTP stand for:
    HTTP = HyperText Transfer Protocol
    FTP = File Transfer Protocol
    HTTP is designed for text, FTP for files. With FTP I can limit my downloads
    to that which is newer than what I already have. HTTP requires downloading
    all, and after having wasted communication resources throwing away that
    which was old and already downloaded previously. Furthermore, with HTTP one
    can download only files known to exist, one at a time, using a browser,
    since TCC does not support copying files from an HTTP server.
    --
    Steve
     
  12. dcantor

    Joined:
    May 29, 2008
    Messages:
    508
    Likes Received:
    3
    TCC does support copying via HTTP:
    Code:
    C:\work> copy http://www.unm.edu/~cdye/dl/iso8601.zip \temp
    http://www.unm.edu/~cdye/dl/iso8601.zip => C:\temp\iso8601.zip
         1 file copied
    
    C:\work> dir C:\temp\iso8601.zip
    
     Volume in drive C is IRVING-C       Serial number is 28aa:b2d9
     Directory of  C:\temp\iso8601.zip
    
    07-31-2010  10:04         175,064  iso8601.zip
               175,064 bytes in 1 file and 0 dirs    176,128 bytes allocated
        12,037,980,160 bytes free
    
    C:\work> ver
    
    TCC  11.00.51   Windows XP [Version 5.1.2600]
    
     
  13. Steve Fabian

    Joined:
    May 20, 2008
    Messages:
    3,520
    Likes Received:
    4
    | TCC does support copying via HTTP:
    | Code:
    | ---------
    | C:\work> copy http://www.unm.edu/~cdye/dl/iso8601.zip \temp
    | http://www.unm.edu/~cdye/dl/iso8601.zip => C:\temp\iso8601.zip
    | 1 file copied

    Thanks, Dave. Obviously, I was wrong about TCC, but not about the HTTP
    vs. FTP comparison:

    1/ Since TCC's DIR does not work for HTTP servers, the incorrect URL cannot
    be corrected by search (Dave, how did you know the correct URL?) - I can
    searchor public directories of an FTP site for the desired file

    2/ For the same reason I cannot download ALL available files, only those I
    am aware of

    3/ For the same reason I cannot check whether or not the server has a
    different version of a file I know of than my copy without downloading and
    comparing

    4/ The copy of an HTTP file has the timestamp of the copying, which is like
    dating Hamlet or the Declaration of Independence at the time you acquired
    your own copy. A true renaissace!

    5/ Once I downloaded a different version of a file, I don't know whether it
    is an older or newer version, unless the content allows me to determine
    that. Only archive files and those files containing explicit version
    information, e.g., retrievable by TCC's @VERINFO function, allow the
    determination to be automated, but never trivial.

    Note: Issue 5/, older or newer version, is a serious problem when
    downloading files using HTTP from different "mirror" sites, which may not be
    properly updated. Programs like the updater.exe included with JPsoft
    products use a separate data file on the server, queried as part of the
    update process, to overcome this problem. It would be way too much burden on
    TCC plugin developers to attemptto do that.
    --
    Steve
     
  14. Charles Dye

    Charles Dye Super Moderator
    Staff Member

    Joined:
    May 20, 2008
    Messages:
    3,354
    Likes Received:
    39
    I think they go out of their way to make it difficult to scrape their web servers because they really don't want people scraping the servers. I've toyed with the idea of building my own FTP/web server -- it would doubtless be a great learning experience -- but that ain't gonna happen any time soon.

    If you really wanted to automate the process, you could just download the index.html in that directory and compare it with a cached copy. If they don't match, you could either (a) copy all the .ZIP files referenced in index.html (the brute-force-and-ignorance approach); or (b) parse the table to detect changed version numbers, and download only the relevant .ZIP files.
     
  15. Steve Fabian

    Joined:
    May 20, 2008
    Messages:
    3,520
    Likes Received:
    4
    | I think they go out of their way to make it difficult to scrape
    | their web servers because they really don't want people scraping the
    | servers.

    I stand by my original statement, at least for the public access web
    servers, which are intended by the University to be available to the WWW.

    | I've toyed with the idea of building my own FTP/web server
    | -- it would doubtless be a great learning experience -- but that
    | ain't gonna happen any time soon.

    I had software several years ago (I might still be able to find it)
    which allowed me to set up a directory on my computer as an FTP root. In
    fact I had set up two - one for uploads, one for downloads. Normal FTP could
    access it from anywhere.

    | If you really wanted to automate the process, you could just
    | download the index.html in that directory and compare it with a
    | cached copy. If they don't match, you could either (a) copy all the
    | .ZIP files referenced in index.html (the brute-force-and-ignorance
    | approach); or (b) parse the table to detect changed version numbers,
    | and download only the relevant .ZIP files.

    Parsing the unqiue formats of each provider requires separate parsing
    package for each. However, remember the old 4DOS batch file Mike Bessy
    published to build a very simple "index.html", enumerating filenames, sizes,
    timestamps and CRC-s for each available file? That would be a great scheme!
    You could rename "index.html" to "details.html", and include it with the
    list of ZIP files.
    BTW, I redate all downloaded archives to the filetime of the latest file
    it contains.
    My normal procedure to download from any one source (your server,
    Vince's LUCKY, ftp://jpsoft.com, etc.) is to create a new subdirectory in
    the appropriate main directory, named by the date yyyymmdd.000 (or .001 for
    2nd download the same day, etc.), and copy only files newer than the last
    download directory contains. For ftp servers this is just one TCC COPY
    command. Using a simple INDEX file as I mentioned above would be just a tiny
    bit more complicated, but would not require a separate parser for each
    source location.
    --
    Steve
     
  16. dcantor

    Joined:
    May 29, 2008
    Messages:
    508
    Likes Received:
    3
    Right.
    From having downloaded it before. You're right that there's no way with HTML to know what other files are in the remote source directory.

    I agree with all that. All I was doing was pointing out that TCC can transfer a file using HTTP. Unfortunately, it's one file at a time. There are tools out there (for Mozilla) that will make a list of all download links on a page and then download all those files. Sorry I don't have a direct reference right now -- look at Addons for Mozilla.
     
  17. thedave

    Joined:
    Nov 13, 2008
    Messages:
    253
    Likes Received:
    2
    On Sat, 31 Jul 2010 15:09:01 -0400, dcantor <> was
    claimed to have wrote:


    For Mozilla based browsers, DownThemAll! is likely your best bet. It's
    not entirely automated, but it can be configured to download all links
    on a page (or all matching a regex, so you can avoid back and page
    resort links)

    From the command line, a win32 port of wget can recursively retrieve
    just about anything you need without re-downloading existing content if
    so configured.
     

Share This Page