1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Regex question

Discussion in 'Support' started by vefatica, Jun 14, 2010.

  1. vefatica

    Joined:
    May 20, 2008
    Messages:
    7,972
    Likes Received:
    30
    I discovered that the Oniguruma library that TCC uses allows for up to 32 captures which might later be used in substitutions (as @XREPLACE does). Gnu sed, for example allows only the back-references \0 to \9.

    As it stands (I think) @XREPLACE allows \0 to \31 but this leaves the problem of how to interpret, say, \10 in a replacement string ... should it insert capture number 10 or capture number 1 followed by a 0? As it stands, @XREPLACE substitutes capture number 10.

    I am tempted to allow only \0 to \9 (as @XREPLACE's documentation already says) and avoid the ambiguity mentioned above and be more like sed.

    Any thoughts?
     
  2. Jim Cook

    Joined:
    May 20, 2008
    Messages:
    604
    Likes Received:
    0
    In perl, s//$10/ replaces parameter 10. If you want parameter 1, and a '0',
    use s//${1}0/. The ${} syntax is used to disambiguate when what follows the
    variable name would otherwise be misinterpreted.

    D:\>perl -e "$v.=$_ for (a..z); $v =~ /(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)/;
    print qq($1 $10 ${1}0)"

    a j a0


    On Mon, Jun 14, 2010 at 12:01 PM, vefatica <> wrote:




    --
    Jim Cook
    2010 Sundays: 4/4, 6/6, 8/8, 10/10, 12/12 and 5/9, 9/5, 7/11, 11/7.
    Next year they're Monday.
     
  3. vefatica

    Joined:
    May 20, 2008
    Messages:
    7,972
    Likes Received:
    30
    On Mon, 14 Jun 2010 15:27:38 -0400, Jim Cook <> wrote:

    |In perl, s//$10/ replaces parameter 10. If you want parameter 1, and a '0',
    |use s//${1}0/. The ${} syntax is used to disambiguate when what follows the
    |variable name would otherwise be misinterpreted.
    |
    |D:\>perl -e "$v.=$_ for (a..z); $v =~ /(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)/;
    |print qq($1 $10 ${1}0)"
    |
    |a j a0

    Thanks. Is there any possibility for ambiguity?

    Does this method of processing a '\' sound right?

    1. if the next char is '\', it's a literal '\' (eat two characters)

    2. else if the next char is a digit, get the number following the '\' (may be
    more than one digit) and use the corresponding back-reference

    3. else if the next char is '{' read the number that follows (up to the '}'?)
    and use the corresponding backref (what if no number follows '\{' or no '}'
    appears after the number ... bad syntax?

    4. else treat it as a literal '\'
    --
    - Vince
     
  4. Jim Cook

    Joined:
    May 20, 2008
    Messages:
    604
    Likes Received:
    0
    Perl throws a syntax error on unpaired {}. It is perfectly happy to
    substitute a variable, e.g.: $plural = "${singular}s" when variables are
    allowed. Undefined things "${undefined}" would be like %undefined% and just
    empty (ok, undef, but that's nitpicking).

    Specifying \1, \2 are backreferences. However, \01 and \001 are the binary
    code 0x01 and not a backreference. The \oct syntax consumes at most three
    octal digits; stopping on non-digit or three count. \{oct} is not supported.
    Certain other escaped characters look like C, e.g.: \n \r \t \a

    My temptation would be to ignore the \oct and \n things in XREPLACE, but I
    wanted to make you aware of them if you weren't already.

    Your rule 4 (else treat it as a literal '\') means that "\\" becomes "\" and
    "\1" becomes backreference 1, but "\q" becomes "\q" which seems
    counterintuitive. I'd make "\q" become "q". In other words, the \ is
    consumed in all cases and affects what comes just after it.

    On Mon, Jun 14, 2010 at 1:43 PM, vefatica <> wrote:




    --
    Jim Cook
    2010 Sundays: 4/4, 6/6, 8/8, 10/10, 12/12 and 5/9, 9/5, 7/11, 11/7.
    Next year they're Monday.
     
  5. vefatica

    Joined:
    May 20, 2008
    Messages:
    7,972
    Likes Received:
    30
    On Mon, 14 Jun 2010 18:11:06 -0400, Jim Cook <> wrote:

    |Specifying \1, \2 are backreferences. However, \01 and \001 are the binary
    |code 0x01 and not a backreference. The \oct syntax consumes at most three
    |octal digits; stopping on non-digit or three count. \{oct} is not supported.
    |Certain other escaped characters look like C, e.g.: \n \r \t \a

    I think I'll allow \n (and insert a CRLF) and \t. What's \a?

    |My temptation would be to ignore the \oct and \n things in XREPLACE, but I
    |wanted to make you aware of them if you weren't already.
    |
    |Your rule 4 (else treat it as a literal '\') means that "\\" becomes "\" and
    |"\1" becomes backreference 1, but "\q" becomes "\q" which seems
    |counterintuitive. I'd make "\q" become "q". In other words, the \ is
    |consumed in all cases and affects what comes just after it.

    So if it's not \\, \n, \t, \number, \{number}, I'll just ignore it and get the
    next char.

    Sound good?
    --
    - Vince
     
  6. Jim Cook

    Joined:
    May 20, 2008
    Messages:
    604
    Likes Received:
    0
    \a is alarm (0x07). I regularly code \b first, then remember that isn't bell
    but backspace :)

    I believe perl conforms to the C standard, which defines these:

    \a (alert) Produces an audible or visible alert without changing the active
    position.
    \b (backspace) Moves the active position to the previous position on the
    current line. If
    the active position is at the initial position of a line, the behavior of
    the display
    device is unspecified.
    \f ( form feed) Moves the active position to the initial position at the
    start of the next
    logical page.
    \n (new line) Moves the active position to the initial position of the next
    line.
    \r (carriage return) Moves the active position to the initial position of
    the current line.
    \t (horizontal tab) Moves the active position to the next horizontal
    tabulation position
    on the current line. If the active position is at or past the last defined
    horizontal
    tabulation position, the behavior of the display device is unspecified.
    \v (vertical tab) Moves the active position to the initial position of the
    next vertical
    tabulation position. If the active position is at or past the last defined
    vertical
    tabulation position, the behavior of the display device is unspecified.


    On Mon, Jun 14, 2010 at 5:11 PM, vefatica <> wrote:




    --
    Jim Cook
    2010 Sundays: 4/4, 6/6, 8/8, 10/10, 12/12 and 5/9, 9/5, 7/11, 11/7.
    Next year they're Monday.
     

Share This Page