Regex question

May 20, 2008
11,400
99
Syracuse, NY, USA
I discovered that the Oniguruma library that TCC uses allows for up to 32 captures which might later be used in substitutions (as @XREPLACE does). Gnu sed, for example allows only the back-references \0 to \9.

As it stands (I think) @XREPLACE allows \0 to \31 but this leaves the problem of how to interpret, say, \10 in a replacement string ... should it insert capture number 10 or capture number 1 followed by a 0? As it stands, @XREPLACE substitutes capture number 10.

I am tempted to allow only \0 to \9 (as @XREPLACE's documentation already says) and avoid the ambiguity mentioned above and be more like sed.

Any thoughts?
 
May 20, 2008
603
0
Sammamish, WA
In perl, s//$10/ replaces parameter 10. If you want parameter 1, and a '0',
use s//${1}0/. The ${} syntax is used to disambiguate when what follows the
variable name would otherwise be misinterpreted.

D:\>perl -e "$v.=$_ for (a..z); $v =~ /(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)/;
print qq($1 $10 ${1}0)"

a j a0


On Mon, Jun 14, 2010 at 12:01 PM, vefatica <> wrote:


> I discovered that the Oniguruma library that TCC uses allows for up to 32
> captures which might later be used in substitutions (as @XREPLACE does).
> Gnu sed, for example allows only the back-references \0 to \9.
>
> As it stands (I think) @XREPLACE allows \0 to \31 but this leaves the
> problem of how to interpret, say, \10 in a replacement string ... should it
> insert capture number 10 or capture number 1 followed by a 0? As it stands,
> @XREPLACE substitutes capture number 10.
>
> I am tempted to allow only \0 to \9 (as @XREPLACE's documentation already
> says) and avoid the ambiguity mentioned above and be more like sed.
>
> Any thoughts?
>
>
>
>
>



--
Jim Cook
2010 Sundays: 4/4, 6/6, 8/8, 10/10, 12/12 and 5/9, 9/5, 7/11, 11/7.
Next year they're Monday.
 
May 20, 2008
11,400
99
Syracuse, NY, USA
On Mon, 14 Jun 2010 15:27:38 -0400, Jim Cook <> wrote:

|In perl, s//$10/ replaces parameter 10. If you want parameter 1, and a '0',
|use s//${1}0/. The ${} syntax is used to disambiguate when what follows the
|variable name would otherwise be misinterpreted.
|
|D:\>perl -e "$v.=$_ for (a..z); $v =~ /(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)/;
|print qq($1 $10 ${1}0)"
|
|a j a0

Thanks. Is there any possibility for ambiguity?

Does this method of processing a '\' sound right?

1. if the next char is '\', it's a literal '\' (eat two characters)

2. else if the next char is a digit, get the number following the '\' (may be
more than one digit) and use the corresponding back-reference

3. else if the next char is '{' read the number that follows (up to the '}'?)
and use the corresponding backref (what if no number follows '\{' or no '}'
appears after the number ... bad syntax?

4. else treat it as a literal '\'
--
- Vince
 
May 20, 2008
603
0
Sammamish, WA
Perl throws a syntax error on unpaired {}. It is perfectly happy to
substitute a variable, e.g.: $plural = "${singular}s" when variables are
allowed. Undefined things "${undefined}" would be like %undefined% and just
empty (ok, undef, but that's nitpicking).

Specifying \1, \2 are backreferences. However, \01 and \001 are the binary
code 0x01 and not a backreference. The \oct syntax consumes at most three
octal digits; stopping on non-digit or three count. \{oct} is not supported.
Certain other escaped characters look like C, e.g.: \n \r \t \a

My temptation would be to ignore the \oct and \n things in XREPLACE, but I
wanted to make you aware of them if you weren't already.

Your rule 4 (else treat it as a literal '\') means that "\\" becomes "\" and
"\1" becomes backreference 1, but "\q" becomes "\q" which seems
counterintuitive. I'd make "\q" become "q". In other words, the \ is
consumed in all cases and affects what comes just after it.

On Mon, Jun 14, 2010 at 1:43 PM, vefatica <> wrote:


> On Mon, 14 Jun 2010 15:27:38 -0400, Jim Cook <> wrote:
>
> |In perl, s//$10/ replaces parameter 10. If you want parameter 1, and a
> '0',
> |use s//${1}0/. The ${} syntax is used to disambiguate when what follows
> the
> |variable name would otherwise be misinterpreted.
> |
> |D:\>perl -e "$v.=$_ for (a..z); $v =~ /(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)/;
> |print qq($1 $10 ${1}0)"
> |
> |a j a0
>
> Thanks. Is there any possibility for ambiguity?
>
> Does this method of processing a '\' sound right?
>
> 1. if the next char is '\', it's a literal '\' (eat two characters)
>
> 2. else if the next char is a digit, get the number following the '\' (may
> be
> more than one digit) and use the corresponding back-reference
>
> 3. else if the next char is '{' read the number that follows (up to the
> '}'?)
> and use the corresponding backref (what if no number follows '\{' or no '}'
> appears after the number ... bad syntax?
>
> 4. else treat it as a literal '\'
> --
> - Vince
>
>
>
>
>



--
Jim Cook
2010 Sundays: 4/4, 6/6, 8/8, 10/10, 12/12 and 5/9, 9/5, 7/11, 11/7.
Next year they're Monday.
 
May 20, 2008
11,400
99
Syracuse, NY, USA
On Mon, 14 Jun 2010 18:11:06 -0400, Jim Cook <> wrote:

|Specifying \1, \2 are backreferences. However, \01 and \001 are the binary
|code 0x01 and not a backreference. The \oct syntax consumes at most three
|octal digits; stopping on non-digit or three count. \{oct} is not supported.
|Certain other escaped characters look like C, e.g.: \n \r \t \a

I think I'll allow \n (and insert a CRLF) and \t. What's \a?

|My temptation would be to ignore the \oct and \n things in XREPLACE, but I
|wanted to make you aware of them if you weren't already.
|
|Your rule 4 (else treat it as a literal '\') means that "\\" becomes "\" and
|"\1" becomes backreference 1, but "\q" becomes "\q" which seems
|counterintuitive. I'd make "\q" become "q". In other words, the \ is
|consumed in all cases and affects what comes just after it.

So if it's not \\, \n, \t, \number, \{number}, I'll just ignore it and get the
next char.

Sound good?
--
- Vince
 
May 20, 2008
603
0
Sammamish, WA
\a is alarm (0x07). I regularly code \b first, then remember that isn't bell
but backspace :)

I believe perl conforms to the C standard, which defines these:

\a (alert) Produces an audible or visible alert without changing the active
position.
\b (backspace) Moves the active position to the previous position on the
current line. If
the active position is at the initial position of a line, the behavior of
the display
device is unspecified.
\f ( form feed) Moves the active position to the initial position at the
start of the next
logical page.
\n (new line) Moves the active position to the initial position of the next
line.
\r (carriage return) Moves the active position to the initial position of
the current line.
\t (horizontal tab) Moves the active position to the next horizontal
tabulation position
on the current line. If the active position is at or past the last defined
horizontal
tabulation position, the behavior of the display device is unspecified.
\v (vertical tab) Moves the active position to the initial position of the
next vertical
tabulation position. If the active position is at or past the last defined
vertical
tabulation position, the behavior of the display device is unspecified.


On Mon, Jun 14, 2010 at 5:11 PM, vefatica <> wrote:


> On Mon, 14 Jun 2010 18:11:06 -0400, Jim Cook <> wrote:
>
> |Specifying \1, \2 are backreferences. However, \01 and \001 are the binary
> |code 0x01 and not a backreference. The \oct syntax consumes at most three
> |octal digits; stopping on non-digit or three count. \{oct} is not
> supported.
> |Certain other escaped characters look like C, e.g.: \n \r \t \a
>
> I think I'll allow \n (and insert a CRLF) and \t. What's \a?
>
> |My temptation would be to ignore the \oct and \n things in XREPLACE, but I
> |wanted to make you aware of them if you weren't already.
> |
> |Your rule 4 (else treat it as a literal '\') means that "\\" becomes "\"
> and
> |"\1" becomes backreference 1, but "\q" becomes "\q" which seems
> |counterintuitive. I'd make "\q" become "q". In other words, the \ is
> |consumed in all cases and affects what comes just after it.
>
> So if it's not \\, \n, \t, \number, \{number}, I'll just ignore it and get
> the
> next char.
>
> Sound good?
> --
> - Vince
>
>
>
>
>



--
Jim Cook
2010 Sundays: 4/4, 6/6, 8/8, 10/10, 12/12 and 5/9, 9/5, 7/11, 11/7.
Next year they're Monday.
 
Similar threads
Thread starter Title Forum Replies Date
vefatica @REGEX question Support 6
F %@regex["^-","-a"] returns 0, "^-" =~ "-a" is false (no match) Support 4
JohnQSmith Regex renaming Support 2
vefatica TPIPE: unbalanced escaped quotes in a regex? Support 5
rps Regex problem: \xnn not recognized as a hex character Support 0
old coot Regex problem: \xnn not recognized as a hex character Support 12
R Regex using ^ Support 2
T Regex engine doesn't recognize native DOS line endings Support 2
P Simple RegEx copy Support 9
samintz WAD Regex Analyzer Support 1
D How to? Use typed envars using regex. Support 3
P Renaming with a RegEx Support 1
R How to? use @everything perl regex Support 2
C v18 regex help please Support 1
C Font of RegEx Analyzer Support 0
D Regex problem Support 17
mikea How to? Regex match when there shouldn't be (?) Support 18
JohnQSmith Fixed Copying with regex (several issues) Support 7
D Help needed to get a regex to work Support 3
thedave WAD Regex match on \h Support 5
Ville Regex & conditionals Support 9
samintz Regex Rename Support 2
vefatica @REGEX: behavior vs. documentation Support 2
vefatica @REGEX revisited Support 4
B Regex and Replace Support 6
Stefano Piccardi detecting BOM, FFIND multibyte regex Support 18
dcantor FFIND syntax -- is /E"regex" /X supported? Support 2
P Renaming files with regex. Support 6
B "Fun" with DO and Regex Support 12
P Need to use a regex in a "for" loop. Support 54
C forum Posting Question... Support 3
C question re: Move Support 3
S How to? Upgrade Question: What supporting documentation is required at time of purchase? Support 2
Peter Murschall Documentation Question to %@PSHELL Support 4
Fross Tab Question Support 6
Fross Quick Function Question Support 17
Dick Johnson Question about the Touch command Support 0
rps Documentation "Copy+Paste+run" question Support 2
vefatica SFTP question Support 17
vefatica Question about IPWorks Support 0
C How to? SHORTCUT question..... Support 6
vefatica TPIPE, crash and question Support 1
C Question / Suggestion Support 1
MickeyF Another TPIPE question Support 6
vefatica Another @EVERYTHING question Support 4
mikea How to? %@everything[] question Support 10
A License Question for Single User Support 5
Jay Sage Help Correction (and Related Question) Support 0
S Elapsed time in TCC prompt question Support 0
H command line parsing question Support 5

Similar threads