@clip peculiarity

  • This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn more.
#1
Character sequence "CR CR LF" is treated by @CLIP as 2 EOLs. All other
operations which deal with lines (@execarray, @fileseekl, @fileread, @line,
ffind) consider it as 1 EOL. IMHO it would be desirable for @CLIP to be
modified to behave identically to other operations.
--
Steve
 
#2
On Fri, 09 Jul 2010 22:53:29 -0400, Steve Fábián
<> wrote:

|Character sequence "CR CR LF" is treated by @CLIP as 2 EOLs.

They are? How so?

Code:
v:\> echo foo^r^r^nbar > clip:

v:\> echo %@clip[0]
foo

v:\> echo %@clip[1]
ECHO is OFF

v:\> echo %@clip[2]
bar
 
#3
On Fri, 09 Jul 2010 23:25:25 -0400, vefatica <>
wrote:

|On Fri, 09 Jul 2010 22:53:29 -0400, Steve Fábián
|<> wrote:
|
||Character sequence "CR CR LF" is treated by @CLIP as 2 EOLs.
|
|They are? How so?
|
|
|Code:
|---------
|v:\> echo foo^r^r^nbar > clip:
|
|v:\> echo %@clip[0]
|foo
|
|v:\> echo %@clip[1]
|ECHO is OFF
|
|v:\> echo %@clip[2]
|bar
|---------

And they really made it to the clipboard. "LIST /X clip:" produces:

Code:
0000 0000 66 6f 6f 0d 0d 0a 62 61  72 0d 0a foo...bar..
 
#4
| Steve wrote:
||
|| Character sequence "CR CR LF" is treated by @CLIP as 2 EOLs.
||
| They are? How so?
|
|
| Code:
| ---------
| v:\> echo foo^r^r^nbar > clip:
|
| v:\> echo %@clip[0]
| foo
|
| v:\> echo %@clip[1]
| ECHO is OFF
|
| v:\> echo %@clip[2]
| bar
| ---------

You have just proved my statement: there is an extra, blank line in CLIP:.
Try to put the same in a file, and use TYPE to display it - there won't be a
blank line.
--
Steve
 
#5
On Fri, 09 Jul 2010 23:45:08 -0400, Steve Fábián
<> wrote:

|| Steve wrote:
|||
||| Character sequence "CR CR LF" is treated by @CLIP as 2 EOLs.
|||
|| They are? How so?
||
||
|| Code:
|| ---------
|| v:\> echo foo^r^r^nbar > clip:
||
|| v:\> echo %@clip[0]
|| foo
||
|| v:\> echo %@clip[1]
|| ECHO is OFF
||
|| v:\> echo %@clip[2]
|| bar
|| ---------
|
|You have just proved my statement: there is an extra, blank line in CLIP:.
|Try to put the same in a file, and use TYPE to display it - there won't be a
|blank line.

Yes, you're right. When the same data is in a file ...

Code:
v:\> list /x clip.txt
0000 0000 66 6f 6f 0d 0d 0a 62 61  72 0d 0a  foo...bar..

v:\> echo %@line[clip.txt,0]
foo

v:\> echo %@line[clip.txt,1]
bar
 

rconn

Administrator
Staff member
May 14, 2008
10,161
86
#6
> Character sequence "CR CR LF" is treated by @CLIP as 2 EOLs. All other
> operations which deal with lines (@execarray, @fileseekl, @fileread,
> @line,
> ffind) consider it as 1 EOL. IMHO it would be desirable for @CLIP to be
> modified to behave identically to other operations.
That's a gibberish line ending -- TCC will recognize files with CR, CR/LF,
or LF line endings. Something like CR/CR/LF cannot be interpreted.
Different APIs are going to return different results, depending on how much
of the text they scan.

You can either clean up your input or wait for the DWIM parser!

Rex Conn
JP Software
 
#7
On Sat, 10 Jul 2010 12:35:05 -0400, rconn <>
wrote:

|That's a gibberish line ending -- TCC will recognize files with CR, CR/LF,
|or LF line endings. Something like CR/CR/LF cannot be interpreted.
|Different APIs are going to return different results, depending on how much
|of the text they scan.

|You can either clean up your input or wait for the DWIM parser!

I wouldn't call it gibberish. It's just redundant (and easily
interpreted as meaning CRLF). Many console apps do it. Blaming
Microsoft doesn't help us deal with it.

The user should expect @LINE[file,N], @line[clip:,N] and @CLIP[N] to
agree when the clipboard and the file contain exactly the same data
... don't you think? We may not have control over what gets
redirected to a file or to the clipboard.
 

rconn

Administrator
Staff member
May 14, 2008
10,161
86
#8
> The user should expect @LINE[file,N], @line[clip:,N] and @CLIP[N] to
> agree when the clipboard and the file contain exactly the same data
> ... don't you think?
No, I do not.

What if you have CR/CR/LF/CR/CR/CR? Is the *real* line ending a CR/LF, or
is it CR with a random LF thrown in?

The only way to handle this consistently would be to forbid all line endings
except CR/LF; I think that's a bit draconian to handle the one instance in
20 years where somebody's complained about @CLIP's GIGO.

Rex Conn
JP Software
 
#9
| ---Quote---
|| Character sequence "CR CR LF" is treated by @CLIP as 2 EOLs. All
|| other operations which deal with lines (@execarray, @fileseekl,
|| @fileread, @line,
|| ffind) consider it as 1 EOL. IMHO it would be desirable for @CLIP
|| to be modified to behave identically to other operations.
| ---End Quote---
| That's a gibberish line ending -- TCC will recognize files with CR,
| CR/LF, or LF line endings. Something like CR/CR/LF cannot be
| interpreted. Different APIs are going to return different results,
| depending on how much of the text they scan.
|
| You can either clean up your input or wait for the DWIM parser!

In the specific instant that triggered my OP it is the output of MS'
ping.exe (WinXP SP3 version). I process my input with TCC, so it must do the
cleaning up. It does it nicely by using @EXECARRAY[pingreport, ping %url]. I
originally planned to use the clipboard, hence the report.

Too bad that the test DEFINED ping[%n] (where ping is an array, and n is
numeric) is always FALSE, whether or not the specific array element has been
initialized to a value other than an empty string. Could this be changed in
a future version?
--
Steve
 
#10
On Sat, 10 Jul 2010 13:17:45 -0400, rconn <>
wrote:

|---Quote---
|> The user should expect @LINE[file,N], @line[clip:,N] and @CLIP[N] to
|> agree when the clipboard and the file contain exactly the same data
|> ... don't you think?
|---End Quote---
|No, I do not.
|
|What if you have CR/CR/LF/CR/CR/CR? Is the *real* line ending a CR/LF, or
|is it CR with a random LF thrown in?

Perhaps I'm old-fashioned, but CR (0x0D) means "move to the beginning
of the current line"; it does not mean "go to the next line" and
doesn't give a new line. A CR when you're already at the beginning of
a line is merely redundant (and as such, poor programming though
perhaps not entirely the programmer's fault). So I'd consider your
exteme example above as simply CRLF. That's what you **see** in a
console.
 
#11
| What if you have CR/CR/LF/CR/CR/CR? Is the *real* line ending a
| CR/LF, or is it CR with a random LF thrown in?
|
| The only way to handle this consistently would be to forbid all line
| endings except CR/LF; I think that's a bit draconian to handle the
| one instance in 20 years where somebody's complained about @CLIP's
| GIGO.

IMHO the best way to handle it is not to consider CR as EOL. This would
be consistent with its purpose in ASCII as a "format effector" moving the
cursor to the beginning of the current line, allowing overprinting (as does
BS). Only the LF, FF, and VT characters put you into a different line.
Furthermore, technically none of those mean column change, hence the CR/LF
sequence. There is also a now obsolete technical reason why the order is CR
LF, not LF CR. AFAIK no system other than the Trash-80 (oops, I meant
TRS-80) ever abused CR to mean EOL.
--
Steve
 

rconn

Administrator
Staff member
May 14, 2008
10,161
86
#12
> Too bad that the test DEFINED ping[%n] (where ping is an array, and n
> is numeric) is always FALSE, whether or not the specific array element
> has been initialized to a value other than an empty string. Could this
> be changed in a future version?
DEFINED refers to environment variables; array variables are not in the
environment. I don't think it's a good idea to widen the scope of DEFINED,
particularly when there's a dozen other existing ways to do it.

Rex Conn
JP Software
 
#14
| ---Quote---
|| Too bad that the test DEFINED ping[%n] (where ping is an array, and
|| n is numeric) is always FALSE, whether or not the specific array
|| element has been initialized to a value other than an empty string.
|| Could this be changed in a future version?
| ---End Quote---
| DEFINED refers to environment variables; array variables are not in
| the environment. I don't think it's a good idea to widen the scope
| of DEFINED, particularly when there's a dozen other existing ways to
| do it.

Looks like a duck, walks like a duck ... I did not refer to checking
whether or not an array variable is defined, I was referring to a single
element of the array, which can be set and its value used just like an
ordinary environment variable. The only test I found to check whether or not
a specific array element is defined is to check its length.
I cannot see a reason not to expand what parameters are acceptable for
the DEFINED status test to include array elements, and indeed even to
internal variables.
--
Steve
 
Jan 24, 2009
36
0
#15
Macs use cr as eol


Sent from my Verizon Wireless Phone

----- Reply message -----
From: "Steve F�bi�" <>
Date: Sat, Jul 10, 2010 2:03 pm
Subject: [Support-t-2150] @clip peculiarity
To: <rob@drrob1.com>

| What if you have CR/CR/LF/CR/CR/CR? Is the *real* line ending a
| CR/LF, or is it CR with a random LF thrown in?
|
| The only way to handle this consistently would be to forbid all line
| endings except CR/LF; I think that's a bit draconian to handle the
| one instance in 20 years where somebody's complained about @CLIP's
| GIGO.

IMHO the best way to handle it is not to consider CR as EOL. This would
be consistent with its purpose in ASCII as a "format effector" moving the
cursor to the beginning of the current line, allowing overprinting (as does
BS). Only the LF, FF, and VT characters put you into a different line.
Furthermore, technically none of those mean column change, hence the CR/LF
sequence. There is also a now obsolete technical reason why the order is CR
LF, not LF CR. AFAIK no system other than the Trash-80 (oops, I meant
TRS-80) ever abused CR to mean EOL.
--
Steve
 
#16
On Sat, 10 Jul 2010 14:45:41 -0400, Steve Fábián
<> wrote:

| I cannot see a reason not to expand what parameters are acceptable for
|the DEFINED status test to include array elements, and indeed even to
|internal variables.

I don't think you'd get what you want. Once you say "SETARRAY n[5]",
the individual elements are "defined". If you want to see if an
element has any meaningful value, use "n" NE "".
 

rconn

Administrator
Staff member
May 14, 2008
10,161
86
#18
> I cannot see a reason not to expand what parameters are acceptable
> for the DEFINED status test to include array elements, and indeed
> even to internal variables.
Adding internal variables seems faintly ridiculous -- why not just test "if
1==1"? In what case would an internal variable *not* be defined?

I am not going to change DEFINED at this point; use one of the many existing
alternatives. And aren't you the same guy who gets crazed when anything is
changed affecting backwards compatibility? :-)

Rex Conn
JP Software
 
#19
| Once you say "SETARRAY n[5]", the individual elements are "defined".

Compare with environment variables. All possible variable names are
always declared IMPLICITLY, and their values accessible without ever
initializing them. If not previously initialized, the value as a string is
the empty string, and as a numeric value it is zero. For example, the
following is perfectly operable code:
UNSET Z
SET /A Z+=1
In the same manner your command above makes %n[0] ... %n[4] accessible,
and %n[5] still inaccessible. You need to use @execarray, @filearray, or SET
to actually initialize the individual array elements just as if they were
independent environment variables.
When you initialize an array's elements using the @filearray function,
its value tells you how many elements you actually initialized.
Unfortunately @execarray does not provide that information, nor does it load
the uninitialized elements with the equivalent of **EOC** as @CLIP[] reports
or with **EOF** as @fileread[] does. In fact you have to know the output
format of the command A PRIORI to locate the end of data. This is why I
requested a change in @execarray.
When the array is initialized using @filearray or @execarray, some
elements may be initialized to empty strings, corresponding to blank lines.
When you process a file using "for %x in (@file) ..." or its DO equivalent,
the test "if defined x" is the simple test for a blank line. I was looking
for a similarly simple test when the file (or command output) is put into an
array. Your test "%n[%i]" NE "" is logically equivalent to what I used to do
for environment variables before DEFINED was available, %@len[%n[%i]] GT 0,
but I am sure both are slower than DEFINED n[%i] would be, though I am not
sure whether your test or mine is faster, but DEFINED is definitely simpler
to understand.
--
Steve
 

rconn

Administrator
Staff member
May 14, 2008
10,161
86
#20
> Unfortunately @execarray does not provide that information, nor does it
> load the uninitialized elements with the equivalent of **EOC** as @CLIP[]
> reports or with **EOF** as @fileread[] does.
Two corrections -- first, you can get the size info with @ARRAYINFO, and
second, there *are* no uninitialized elements. @EXECARRAY allocates only as
much as it needs, and initializes everything.

Rex Conn
JP Software
 
#21
there *are* no uninitialized elements. @EXECARRAY allocates only as much as it needs, and initializes everything.
That may be true from the programmer's point of view, but from the user's point of view the array may be much bigger than the commands output and the elements of the array beyond the command's output will (to him) be uninitialized.
 
#22
| Adding internal variables seems faintly ridiculous -- why not just
| test "if 1==1"? In what case would an internal variable *not* be
| defined?

1/ Most commonly when one uses an older version of TCC than the variable's
first appearance

2/ Analogously to when an environment is "not defined": when its value is an
empty string

| I am not going to change DEFINED at this point; use one of the many
| existing alternatives. And aren't you the same guy who gets crazed
| when anything is changed affecting backwards compatibility? :-)

As to alternative tests, none are simple, and esp. none are as easy to
read as a DEFINED test.
This change would not affect backward compatibility, because all
existing code would continue to work without change.
--
Steve
 
#23
| ---Quote---
|| Unfortunately @execarray does not provide that information, nor
|| does it load the uninitialized elements with the equivalent of
|| **EOC** as @CLIP[] reports or with **EOF** as @fileread[] does.
| ---End Quote---
| Two corrections -- first, you can get the size info with @ARRAYINFO,
| and second, there *are* no uninitialized elements. @EXECARRAY
| allocates only as much as it needs, and initializes everything.

"You must define the array before running @EXECARRAY." Doesn't this mean
that I must first use SETARRAY to create the array into which @EXECARRAY
loads data, and @ARRAYINFO returns the same size before and after
@EXECARRAY? I appreciate that when the array is too large @EXECARRAY sets
unused elements to empty strings (destroying leftover data), but it still
does not provide the information how many lines were processed.
--
Steve
 
Jun 7, 2008
121
0
#24
Character sequence "CR CR LF" is treated by @CLIP as 2 EOLs. All other
operations which deal with lines (@execarray, @fileseekl, @fileread, @line,
ffind) consider it as 1 EOL. IMHO it would be desirable for @CLIP to be
modified to behave identically to other operations.
Steve
I personally think all bets are off when malformed textual files are handed to textual functions. It should be such that Windows/DOS textual files should always have paired CRLF characters and that any have just LF or just CR should be regarded as malformed. Only binary functions should work on these.

Cheers

Stephen Howe
 
#25
| I personally think all bets are off when malformed textual files are
| handed to textual functions. It should be such that Windows/DOS
| textual files should always have paired CRLF characters and that any
| have just LF or just CR should be regarded as malformed. Only binary
| functions should work on these.

I disagree. When program output is intended to control a character mode
display device, consecutive CR characters are legitimate. For example, the
program may display different information at different times on the same
line, e.g., the percentage of a file already copied by the COPY /G command.
When such output is redirected from the display device, each device control
character, such as BS, CR and LF, should be interpreted according to what
the standards specify they do. Lines on a display device are a matter of
perception. The interpretation should match the perception. Not all text
files contain pure text!
--
Steve
 
May 29, 2008
521
3
Groton, CT
#26
I personally think all bets are off when malformed textual files are handed to textual functions. It should be such that Windows/DOS textual files should always have paired CRLF characters and that any have just LF or just CR should be regarded as malformed. Only binary functions should work on these.
I respectfully disagree. Functions which WRITE text files should write them correctly, but functions which READ text files should be tolerant. Text files still survive from the antiquated era when character codes drove character-by-character printing devices (like Teletype ASR33), and multiple CR characters were sometimes strung together to cause a delay while the physical carriage actually returned to the left margin. (The NUL character and the DEL character were also used for this purpose.)

Other operating systems have differing standards of what separates lines (or records) in a text file. Some use CRLF; some use LFCR; some use just CR; some use just LF (and call it a "newline"). I have seen other characters used as line- or record terminators, too. In particular NUL and ETX have been used. It is reasonable that users of TCC might want to be able to read, interpret, and process those files. They're not malformed; they are written to different standards.

In TTY days of yore, you didn't need CRLFCRLF to produce a blank line; all you needed was CRLFLF, as the carriage was already at the left margin.

In a perfect world (and I am NOT asking for this, Rex), one would be able to specify the character (or character string) which is considered to be a line terminator (or separator) and writing functions like @FILEWRITE[] would use it; and one would also specify a set of terminator strings that reading functions would recognize as EOL. TCC allowing CR, LF, and CRLF is pretty good, though.

I would ask that all the TCC functions use the same method of counting and numbering lines, though, and I think that's pretty close to the original request in this thread.

(I specifically suggest, Rex, that, if you can, you recognize any number of consecutive CRs immediately followed by a single LF (or VT or FF) as ONE terminator.)