Insert text at start/end of line

Charles Dye · Aug 7, 2020

I spent way too much time trying to do this with regular expressions....

Insert text at start of line:

Code:

tpipe /input=filename.txt /insert=0,1,"rem "

Insert text at end of line:

Code:

tpipe /input=filename.txt /insert=0,0," //"

Both at once:

Code:

tpipe /input=filename.txt /insert=0,1,"[[ " /insert=0,0," ]]"

AnrDaemon · Aug 8, 2020

sed -Ee 's/./text\0/;'

vefatica · Aug 8, 2020

This too.

Code:

vefatica@jj:~$ echo foo | sed -e 's/.*/prefix \0 postfix/g'
prefix foo postfix
vefatica@jj:~$ echo foo | sed -e 's/.*/[[ \0 ]]/g'
[[ foo ]]

But it gets all fouled up when I try to use it from windows. Below, the second is expected; the first seems wacky.

Code:

v:\> echo foo | (wsl sed -e 's/.*/[[ \0 ]]/g')
 ]]foo

v:\> echo foo | (wsl sed -e 's/.*/[[ \\0 ]]/g')
[[ \0 ]]

vefatica · Aug 8, 2020

The TPIPE/regex one wasn't too hard (only about 5 minutes).

Code:

v:\> echo foo | tpipe /replace=4,0,0,0,0,0,0,0,0,"^(.*)$","pre $1 post"
pre foo post

v:\> echo foo | tpipe /replace=4,0,0,0,0,0,0,0,0,"^(.*)$","[[ $1 ]]"
[[ foo ]]

Charles Dye · Aug 8, 2020

vefatica said:
The TPIPE/regex one wasn't too hard (only about 5 minutes).

You win.

Okay, here's a puzzle for you. I have a number of HTML files I'd like to deHTMLize. TPIPE /SIMPLE=16 /SIMPLE=85 is a good start, but still leaves a whole lot of gribble: header, style sheet, scripts....

So what I'd like is a way to include only the text between <BODY> and </BODY>, and omit everything outside those two tags. /XML looks like it should be useful, but I can't get it to do anything. Can you see any way to do this with TPIPE?

vefatica · Aug 8, 2020

Charles Dye said:
So what I'd like is a way to include only the text between <BODY> and </BODY>, and omit everything outside those two tags. /XML looks like it should be useful, but I can't get it to do anything. Can you see any way to do this with TPIPE?

Geez! I used to have a plugin called FROMTO that would do just that. It's such a simple task that I'm surprised TPIPE doesn't have it. I wonder if there's something UNIXy that'll do it.

vefatica · Aug 8, 2020

Charles, did you see this (restrict to between tags)? I can't figure out how to use it.

TPIPE - Text filtering, search, and substitution
/xml=Type,IncludeText,IncludeQuotes,MatchCase,BufferSize,Tag,Attribute,EndTag

Adds an HTML / XML filter. The arguments are:

Type - the operation to perform:

0 restrict to an element

1 restrict to an attribute

2 restrict to between tags

IncludeText - whether to include the find string in the restriction result (default false)

IncludeQuotes - whether to include surrounding quotes in the attribute result or not (default false)

MatchCase - match case exactly or not (default false)

BufferSize - the maximum expected size of the match (default 32768)

Tag - the element or start tag to find

Attribute - the attribute to find

EndTag - the endTag to find

Charles Dye · Aug 8, 2020

vefatica said:
Charles, did you see this (restrict to between tags)? I can't figure out how to use it.

Yes, I spent a while messing with that one. I think that lets you set up subfilters that would affect everything between, e.g., <BODY> and </BODY>. But they don't affect anything outside the selection. So, not useful for my purposes.

RogerB · Aug 8, 2020

I'm not sure if this will do what you need Charles, but a tpipe replace filter that extracts non matching text might be the answer. I've only tried this with a trivial test file as below, so apologies if it fails horribly for your use case.

This is my test file:

Code:

d:\batch\test>type foobar.txt
This is before the body tags
<Body> This is <sometag> the body text <anothertag> interspersed with <onemoretag> tags
This is more body text
</Body>
This is after the body tags

And my understanding of what you want to do is end up with just the text between the <Body> and </Body> tags, and also strip out any other tags between them.

This tpipe command works on the above sample file:

Code:

d:\batch\test>tpipe /input=foobar.txt /eol=2,0,0,0,0 /replace=4,0,0,0,0,1,0,0,0,"(<Body>.*</Body>)","$1" /simple=16
This is  the body text  interspersed with  tags
This is more body text

It uses a regex backreference to replace the text between the body tags with itself (so it's unchanged), and the replace filter is set to discard non-matching text, so you're just left with the text you want, and then the /simple=16 gets rid of the tags themselves.

Hopefully that might give you something that will help...

Edited to add: I forgot to say i did a Setdos /x-6 before that tpipe command to stop the tags being mistaken for redirection.

vefatica · Aug 8, 2020

This is the first time I've seen a subfilter work. Below (1) the filter seems to affect only what's between the tags and (2) it seems to determine what is to be removed (as opposed to what's to be kept).

Code:

v:\> type tag.html
before
<body>
inside
</body>
after

v:\> tpipe /input=tag.html /xml=2,0,0,0,32768,"body",foo,"/body" /startsubfilters /grep=4,0,0,0,0,0,0,0,"junk|junk|junk" /endsub
filters
before
<body>
inside
</body>
after

v:\> tpipe /input=tag.html /xml=2,0,0,0,32768,"body",foo,"/body" /startsubfilters /grep=4,0,0,0,0,0,0,0,"before|inside|after" /e
ndsubfilters
before
<body>
</body>
after

Charles Dye · Aug 8, 2020

RogerB said:

So, you're essentially treating the whole file as one very long line? Clever! Thank you.

vefatica · Aug 8, 2020

Charles Dye said:
So, you're essentially treating the whole file as one very long line? Clever! Thank you.

Hmmm! It doesn't work very well on TCC's dir.htm ... no output at all.

Code:

v:\help26> grep -i body dir.htm
html, body {
html, body { overflow: auto; }
<body>
</body>

v:\help26> tpipe /input=dir.htm /eol=2,0,0,0,0 /replace=4,0,0,0,0,1,0,0,0,"(<body>.*</body>)","$1" /simple=16

v:\help26>

Charles Dye · Aug 8, 2020

vefatica said:
Hmmm! It doesn't work very well on TCC's dir.htm ... no output at all.

That's an awfully big file. Try adding a /BUFFERSIZE=150000 after the /REPLACE.

vefatica · Aug 8, 2020

Charles Dye said:
That's an awfully big file. Try adding a /BUFFERSIZE=150000 after the /REPLACE.

That helps. I chose it because it was big. But really, a 138 KB HTM isn't big by today's standards.

Charles Dye · Aug 8, 2020

It's not a big file. But viewed as a single line.... That's a long line!

(That is what that /EOL is doing, right? Am I understanding correctly? That's not how I would read the help file. But it seems to work.)

RogerB · Aug 9, 2020

Charles Dye said:
That is what that /EOL is doing, right? Am I understanding correctly?)

I don't think so, but then I might be wrong - tpipe remains a bit of a black art to me, I'm sometimes surprised with the results I get!

What I meant the /EOL to do is convert the CR/LF line endings into LF, you can see it's not just one long line if you run the tpipe command on my sample file with just the /EOL filter. The reason for doing that is that my understanding (from my experimentation) of the PERL pattern matching in the /replace filter is that the ".*" pattern will match EOL, providing they are just LF. So by changing the line endings the pattern "<body>.*<body>" matches everything between the body tags, even if spread over multiple lines.

It's not a generic, bombproof solution unfortunately; Vince has discovered that large files need a bigger buffer, and the regex needs some work to match cases where the body tag is, for example:

Code:

<body bgcolor="#FFFFFF">

Charles Dye · Aug 9, 2020

RogerB said:
What I meant the /EOL to do is convert the CR/LF line endings into LF, you can see it's not just one long line if you run the tpipe command on my sample file with just the /EOL filter. The reason for doing that is that my understanding (from my experimentation) of the PERL pattern matching in the /replace filter is that the ".*" pattern will match EOL, providing they are just LF. So by changing the line endings the pattern "<body>.*<body>" matches everything between the body tags, even if spread over multiple lines.

Thank you.

vefatica · Aug 9, 2020

It's a little hard to figure out what's happening. If I put the bare LF in a file myself, then RogerB's strategy doesn't work (/replace doesn't see across the LF).

Code:

v:\> echos 1abc^ndef2 > 12.txt

v:\> type /x 12.txt
0000 0000 31 61 62 63 0a 64 65 66  32                       1abc.def2

v:\> tpipe /input=12.txt /replace=4,0,0,0,0,1,0,0,0,"(1.*2)","$1"

v:\>

If I've got that test right, what's going on?

RogerB · Aug 9, 2020

vefatica said:
If I've got that test right, what's going on?

It's me that's wrong in my use of the /EOL filter, Charles is correct and it's stripping out all of the line endings and making one long line of it.

Code:

:\>type /x 12.txt
0000 0000 31 61 62 63 0a 64 65 66  32                       1abc.def2

d:\>tpipe /input=12.txt /eol=2,0,0,0,0 /replace=4,0,0,0,0,1,0,0,0,"(1.*2)","$1"
1abcdef2

d:\>tpipe /input=12.txt /eol=2,0,0,0,0 | type /x
0000 0000 31 61 62 63 64 65 66 32                           1abcdef2

Oh well, back to the help file to see if I can understand the EOL filter! I'm coming to the conclusion that once you get something working with tpipe it's best not to think too hard about it

vefatica · Aug 9, 2020

RogerB said:
It's me that's wrong in my use of the /EOL filter, Charles is correct and it's stripping out all of the line endings and making one long line of it.

I think it's OK. TPIPE isn't stripping them, it's leaving LF and the regex in /replace is seeing that as a character (EOL being CRLF).

Code:

v:\> echos 1^r^n2 > 12.txt

v:\> type /x 12.txt
0000 0000 31 0d 0a 32                                       1..2

v:\> tpipe /input=12.txt /eol=2,0,0,0,0 /replace=4,0,0,0,0,1,0,0,0,"1.2","xxx"
xxx

(A little odd) /grep doesn't see the bare LF as a character.

Code:

v:\> type /x 12.txt
0000 0000 31 0d 0a 32                                       1..2

v:\> tpipe /input=12.txt /eol=2,0,0,0,0 /grep=3,0,0,0,0,0,0,0,"1.2"

v:\>

RogerB · Aug 9, 2020

vefatica said:
I think it's OK. TPIPE isn't stripping them, it's leaving LF and the regex in /replace is seeing that as a character (EOL being CRLF).

Hmmm. I was just testing something similar and came to the same conclusion, the /eol=2,0,0,0,0 definitely changes a CR/LF pair to just an LF, and I see the /replace pattern matching the embedded LF too.

Which leaves the question of why your example doesn't work:

Code:

v:\> tpipe /input=12.txt /replace=4,0,0,0,0,1,0,0,0,"(1.*2)","$1"

I can't see what's happening there at all. After all, it works with the foobar.txt file in my earlier post.

vefatica · Aug 9, 2020

RogerB said:
Which leaves the question of why your example doesn't work:

Code:

v:\> tpipe /input=12.txt /replace=4,0,0,0,0,1,0,0,0,"(1.*2)","$1"

I can't see what's happening there at all. After all, it works with the foobar.txt file in my earlier post.

Do you mean when the bare LF is already in the file? See me post in "Support". Apparently TPIPE changes that to a CRLF upon input!

RogerB · Aug 9, 2020

vefatica said:
Do you mean when the bare LF is already in the file? See me post in "Support". Apparently TPIPE changes that to a CRLF upon input!

Ah, I see! That certainly looks to be a bug in tpipe, you don't even need to use a filter, just let tpipe read the file and you see it:

Code:

d:\>type /x 12.txt
0000 0000 31 61 62 63 0a 64 65 66  32                       1abc.def2

d:\>tpipe /input=12.txt | type /x
0000 0000 31 61 62 63 0d 0a 64 65  66 32                    1abc..def2

We've got a bit sidetracked from the question Charles originally posed, I hope something in this thread has helped solve his issue!

RogerB · Aug 9, 2020

vefatica said:
(A little odd) /grep doesn't see the bare LF as a character.

The help describes /grep as a "line based filter", so perhaps it is only intended to work with lines from a file, and hence newlines will always terminate the expression that's being matched.

According to the help you can set newline matching behaviour for search/replace filters with /perl=. The "DotMatchesNewLines" option says "Allow the '.' operator to match all characters, including new lines. Default is true".

vefatica · Aug 9, 2020

As for DotMatchesNewLine, it doesn't seem as though the default is TRUE.

Code:

v:\> type tag.html
before
<body>
inside
</body>
after

v:\> tpipe /input=tag.html /replace=4,0,0,0,0,1,0,0,0,"(<body>.*</body>)","$1"

v:\> tpipe /input=tag.html /replace=4,0,0,0,0,1,0,0,0,"(<body>.*</body>)","$1" /perl=,,,1
<body>
inside
</body>
v:\>

"/perl=,,,1" seems to have no effect on a /grep filter.

Charles Dye · Aug 9, 2020

RogerB said:
We've got a bit sidetracked from the question Charles originally posed, I hope something in this thread has helped solve his issue!

Not so much an "issue" as a desire to learn. TPIPE seems like a very powerful command, if only I understood it better....

RogerB · Aug 10, 2020

Charles Dye said:
if only I understood it better....

I know that feeling, most of my efforts to understand tpipe leave me feeling like I’m in a dark room looking for a black cat that isn’t there. However, it is an insanely powerful command if you can manage to wrangle a set of filters into doing what you want.

David Marcus · Aug 10, 2020

I played with this a little without success (probably should have used setdos first). I would think the DotMatchesNewLines option of /perl would be helpful. Regular expressions are complicated enough, but if TCC messes with the command line, you have no idea whether it is TTC or tpipe that is messing things up. A debug option for tpipe would help where it told you what it thinks you told it to do, i.e., what all the parameters are that TCC passes to it.

Joe Caverly · Aug 10, 2020

/simple = 46

TPIPE - Text filtering, search, and substitution
Display debug window

A debug filter is very handy for debugging filters. When text is passed through this filter, it places the output into a window so that you can see what the text looks like at that stage of the filtering process.

Joe

Joe Caverly · Aug 10, 2020

/log=Filename

Log the TPIPE actions.

Filename - Name of log file

Joe

Welcome!

Insert text at start/end of line

Super Moderator

Super Moderator

Super Moderator

Super Moderator

Super Moderator

Super Moderator

Super Moderator

Super Moderator

Similar threads