1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

@clip peculiarity

Discussion in 'Support' started by Steve Fabian, Jul 9, 2010.

  1. Steve Fabian

    Steve Fabian

    Messages:
    3,531
    Character sequence "CR CR LF" is treated by @CLIP as 2 EOLs. All other
    operations which deal with lines (@execarray, @fileseekl, @fileread, @line,
    ffind) consider it as 1 EOL. IMHO it would be desirable for @CLIP to be
    modified to behave identically to other operations.
    --
    Steve
  2. vefatica

    vefatica

    Messages:
    5,069
    On Fri, 09 Jul 2010 22:53:29 -0400, Steve Fábián
    <> wrote:

    |Character sequence "CR CR LF" is treated by @CLIP as 2 EOLs.

    They are? How so?

    Code:
    v:\> echo foo^r^r^nbar > clip:
    
    v:\> echo %@clip[0]
    foo
    
    v:\> echo %@clip[1]
    ECHO is OFF
    
    v:\> echo %@clip[2]
    bar
  3. vefatica

    vefatica

    Messages:
    5,069
    On Fri, 09 Jul 2010 23:25:25 -0400, vefatica <>
    wrote:

    |On Fri, 09 Jul 2010 22:53:29 -0400, Steve Fábián
    |<> wrote:
    |
    ||Character sequence "CR CR LF" is treated by @CLIP as 2 EOLs.
    |
    |They are? How so?
    |
    |
    |Code:
    |---------
    |v:\> echo foo^r^r^nbar > clip:
    |
    |v:\> echo %@clip[0]
    |foo
    |
    |v:\> echo %@clip[1]
    |ECHO is OFF
    |
    |v:\> echo %@clip[2]
    |bar
    |---------

    And they really made it to the clipboard. "LIST /X clip:" produces:

    Code:
    0000 0000 66 6f 6f 0d 0d 0a 62 61  72 0d 0a foo...bar..
  4. Steve Fabian

    Steve Fabian

    Messages:
    3,531
    | Steve wrote:
    ||
    || Character sequence "CR CR LF" is treated by @CLIP as 2 EOLs.
    ||
    | They are? How so?
    |
    |
    | Code:
    | ---------
    | v:\> echo foo^r^r^nbar > clip:
    |
    | v:\> echo %@clip[0]
    | foo
    |
    | v:\> echo %@clip[1]
    | ECHO is OFF
    |
    | v:\> echo %@clip[2]
    | bar
    | ---------

    You have just proved my statement: there is an extra, blank line in CLIP:.
    Try to put the same in a file, and use TYPE to display it - there won't be a
    blank line.
    --
    Steve
  5. vefatica

    vefatica

    Messages:
    5,069
    On Fri, 09 Jul 2010 23:45:08 -0400, Steve Fábián
    <> wrote:

    || Steve wrote:
    |||
    ||| Character sequence "CR CR LF" is treated by @CLIP as 2 EOLs.
    |||
    || They are? How so?
    ||
    ||
    || Code:
    || ---------
    || v:\> echo foo^r^r^nbar > clip:
    ||
    || v:\> echo %@clip[0]
    || foo
    ||
    || v:\> echo %@clip[1]
    || ECHO is OFF
    ||
    || v:\> echo %@clip[2]
    || bar
    || ---------
    |
    |You have just proved my statement: there is an extra, blank line in CLIP:.
    |Try to put the same in a file, and use TYPE to display it - there won't be a
    |blank line.

    Yes, you're right. When the same data is in a file ...

    Code:
    v:\> list /x clip.txt
    0000 0000 66 6f 6f 0d 0d 0a 62 61  72 0d 0a  foo...bar..
    
    v:\> echo %@line[clip.txt,0]
    foo
    
    v:\> echo %@line[clip.txt,1]
    bar
  6. rconn

    rconn Administrator Staff Member

    Messages:
    6,710
    That's a gibberish line ending -- TCC will recognize files with CR, CR/LF,
    or LF line endings. Something like CR/CR/LF cannot be interpreted.
    Different APIs are going to return different results, depending on how much
    of the text they scan.

    You can either clean up your input or wait for the DWIM parser!

    Rex Conn
    JP Software
  7. vefatica

    vefatica

    Messages:
    5,069
    On Sat, 10 Jul 2010 12:35:05 -0400, rconn <>
    wrote:

    |That's a gibberish line ending -- TCC will recognize files with CR, CR/LF,
    |or LF line endings. Something like CR/CR/LF cannot be interpreted.
    |Different APIs are going to return different results, depending on how much
    |of the text they scan.

    |You can either clean up your input or wait for the DWIM parser!

    I wouldn't call it gibberish. It's just redundant (and easily
    interpreted as meaning CRLF). Many console apps do it. Blaming
    Microsoft doesn't help us deal with it.

    The user should expect @LINE[file,N], @line[clip:,N] and @CLIP[N] to
    agree when the clipboard and the file contain exactly the same data
    ... don't you think? We may not have control over what gets
    redirected to a file or to the clipboard.
  8. rconn

    rconn Administrator Staff Member

    Messages:
    6,710
    No, I do not.

    What if you have CR/CR/LF/CR/CR/CR? Is the *real* line ending a CR/LF, or
    is it CR with a random LF thrown in?

    The only way to handle this consistently would be to forbid all line endings
    except CR/LF; I think that's a bit draconian to handle the one instance in
    20 years where somebody's complained about @CLIP's GIGO.

    Rex Conn
    JP Software
  9. Steve Fabian

    Steve Fabian

    Messages:
    3,531
    | ---Quote---
    || Character sequence "CR CR LF" is treated by @CLIP as 2 EOLs. All
    || other operations which deal with lines (@execarray, @fileseekl,
    || @fileread, @line,
    || ffind) consider it as 1 EOL. IMHO it would be desirable for @CLIP
    || to be modified to behave identically to other operations.
    | ---End Quote---
    | That's a gibberish line ending -- TCC will recognize files with CR,
    | CR/LF, or LF line endings. Something like CR/CR/LF cannot be
    | interpreted. Different APIs are going to return different results,
    | depending on how much of the text they scan.
    |
    | You can either clean up your input or wait for the DWIM parser!

    In the specific instant that triggered my OP it is the output of MS'
    ping.exe (WinXP SP3 version). I process my input with TCC, so it must do the
    cleaning up. It does it nicely by using @EXECARRAY[pingreport, ping %url]. I
    originally planned to use the clipboard, hence the report.

    Too bad that the test DEFINED ping[%n] (where ping is an array, and n is
    numeric) is always FALSE, whether or not the specific array element has been
    initialized to a value other than an empty string. Could this be changed in
    a future version?
    --
    Steve
  10. vefatica

    vefatica

    Messages:
    5,069
    On Sat, 10 Jul 2010 13:17:45 -0400, rconn <>
    wrote:

    |---Quote---
    |> The user should expect @LINE[file,N], @line[clip:,N] and @CLIP[N] to
    |> agree when the clipboard and the file contain exactly the same data
    |> ... don't you think?
    |---End Quote---
    |No, I do not.
    |
    |What if you have CR/CR/LF/CR/CR/CR? Is the *real* line ending a CR/LF, or
    |is it CR with a random LF thrown in?

    Perhaps I'm old-fashioned, but CR (0x0D) means "move to the beginning
    of the current line"; it does not mean "go to the next line" and
    doesn't give a new line. A CR when you're already at the beginning of
    a line is merely redundant (and as such, poor programming though
    perhaps not entirely the programmer's fault). So I'd consider your
    exteme example above as simply CRLF. That's what you **see** in a
    console.
  11. Steve Fabian

    Steve Fabian

    Messages:
    3,531
    | What if you have CR/CR/LF/CR/CR/CR? Is the *real* line ending a
    | CR/LF, or is it CR with a random LF thrown in?
    |
    | The only way to handle this consistently would be to forbid all line
    | endings except CR/LF; I think that's a bit draconian to handle the
    | one instance in 20 years where somebody's complained about @CLIP's
    | GIGO.

    IMHO the best way to handle it is not to consider CR as EOL. This would
    be consistent with its purpose in ASCII as a "format effector" moving the
    cursor to the beginning of the current line, allowing overprinting (as does
    BS). Only the LF, FF, and VT characters put you into a different line.
    Furthermore, technically none of those mean column change, hence the CR/LF
    sequence. There is also a now obsolete technical reason why the order is CR
    LF, not LF CR. AFAIK no system other than the Trash-80 (oops, I meant
    TRS-80) ever abused CR to mean EOL.
    --
    Steve
  12. rconn

    rconn Administrator Staff Member

    Messages:
    6,710
    DEFINED refers to environment variables; array variables are not in the
    environment. I don't think it's a good idea to widen the scope of DEFINED,
    particularly when there's a dozen other existing ways to do it.

    Rex Conn
    JP Software
  13. Charles Dye

    Charles Dye Super Moderator Staff Member

    Messages:
    2,553
    (cough) PETSCII (cough)
  14. Steve Fabian

    Steve Fabian

    Messages:
    3,531
    | ---Quote---
    || Too bad that the test DEFINED ping[%n] (where ping is an array, and
    || n is numeric) is always FALSE, whether or not the specific array
    || element has been initialized to a value other than an empty string.
    || Could this be changed in a future version?
    | ---End Quote---
    | DEFINED refers to environment variables; array variables are not in
    | the environment. I don't think it's a good idea to widen the scope
    | of DEFINED, particularly when there's a dozen other existing ways to
    | do it.

    Looks like a duck, walks like a duck ... I did not refer to checking
    whether or not an array variable is defined, I was referring to a single
    element of the array, which can be set and its value used just like an
    ordinary environment variable. The only test I found to check whether or not
    a specific array element is defined is to check its length.
    I cannot see a reason not to expand what parameters are acceptable for
    the DEFINED status test to include array elements, and indeed even to
    internal variables.
    --
    Steve
  15. drrob106

    drrob106

    Messages:
    36
    Macs use cr as eol


    Sent from my Verizon Wireless Phone

    ----- Reply message -----
    From: "Steve F�bi�" <>
    Date: Sat, Jul 10, 2010 2:03 pm
    Subject: [Support-t-2150] @clip peculiarity
    To: <rob@drrob1.com>

    | What if you have CR/CR/LF/CR/CR/CR? Is the *real* line ending a
    | CR/LF, or is it CR with a random LF thrown in?
    |
    | The only way to handle this consistently would be to forbid all line
    | endings except CR/LF; I think that's a bit draconian to handle the
    | one instance in 20 years where somebody's complained about @CLIP's
    | GIGO.

    IMHO the best way to handle it is not to consider CR as EOL. This would
    be consistent with its purpose in ASCII as a "format effector" moving the
    cursor to the beginning of the current line, allowing overprinting (as does
    BS). Only the LF, FF, and VT characters put you into a different line.
    Furthermore, technically none of those mean column change, hence the CR/LF
    sequence. There is also a now obsolete technical reason why the order is CR
    LF, not LF CR. AFAIK no system other than the Trash-80 (oops, I meant
    TRS-80) ever abused CR to mean EOL.
    --
    Steve
  16. vefatica

    vefatica

    Messages:
    5,069
    On Sat, 10 Jul 2010 14:45:41 -0400, Steve Fábián
    <> wrote:

    | I cannot see a reason not to expand what parameters are acceptable for
    |the DEFINED status test to include array elements, and indeed even to
    |internal variables.

    I don't think you'd get what you want. Once you say "SETARRAY n[5]",
    the individual elements are "defined". If you want to see if an
    element has any meaningful value, use "n" NE "".
  17. rconn

    rconn Administrator Staff Member

    Messages:
    6,710
    Macs use CR as EOL.

    Rex Conn
    JP Software
  18. rconn

    rconn Administrator Staff Member

    Messages:
    6,710
    Adding internal variables seems faintly ridiculous -- why not just test "if
    1==1"? In what case would an internal variable *not* be defined?

    I am not going to change DEFINED at this point; use one of the many existing
    alternatives. And aren't you the same guy who gets crazed when anything is
    changed affecting backwards compatibility? :-)

    Rex Conn
    JP Software
  19. Steve Fabian

    Steve Fabian

    Messages:
    3,531
    | Once you say "SETARRAY n[5]", the individual elements are "defined".

    Compare with environment variables. All possible variable names are
    always declared IMPLICITLY, and their values accessible without ever
    initializing them. If not previously initialized, the value as a string is
    the empty string, and as a numeric value it is zero. For example, the
    following is perfectly operable code:
    UNSET Z
    SET /A Z+=1
    In the same manner your command above makes %n[0] ... %n[4] accessible,
    and %n[5] still inaccessible. You need to use @execarray, @filearray, or SET
    to actually initialize the individual array elements just as if they were
    independent environment variables.
    When you initialize an array's elements using the @filearray function,
    its value tells you how many elements you actually initialized.
    Unfortunately @execarray does not provide that information, nor does it load
    the uninitialized elements with the equivalent of **EOC** as @CLIP[] reports
    or with **EOF** as @fileread[] does. In fact you have to know the output
    format of the command A PRIORI to locate the end of data. This is why I
    requested a change in @execarray.
    When the array is initialized using @filearray or @execarray, some
    elements may be initialized to empty strings, corresponding to blank lines.
    When you process a file using "for %x in (@file) ..." or its DO equivalent,
    the test "if defined x" is the simple test for a blank line. I was looking
    for a similarly simple test when the file (or command output) is put into an
    array. Your test "%n[%i]" NE "" is logically equivalent to what I used to do
    for environment variables before DEFINED was available, %@len[%n[%i]] GT 0,
    but I am sure both are slower than DEFINED n[%i] would be, though I am not
    sure whether your test or mine is faster, but DEFINED is definitely simpler
    to understand.
    --
    Steve
  20. rconn

    rconn Administrator Staff Member

    Messages:
    6,710
    Two corrections -- first, you can get the size info with @ARRAYINFO, and
    second, there *are* no uninitialized elements. @EXECARRAY allocates only as
    much as it needs, and initializes everything.

    Rex Conn
    JP Software
  21. vefatica

    vefatica

    Messages:
    5,069
    That may be true from the programmer's point of view, but from the user's point of view the array may be much bigger than the commands output and the elements of the array beyond the command's output will (to him) be uninitialized.
  22. Steve Fabian

    Steve Fabian

    Messages:
    3,531
    | Adding internal variables seems faintly ridiculous -- why not just
    | test "if 1==1"? In what case would an internal variable *not* be
    | defined?

    1/ Most commonly when one uses an older version of TCC than the variable's
    first appearance

    2/ Analogously to when an environment is "not defined": when its value is an
    empty string

    | I am not going to change DEFINED at this point; use one of the many
    | existing alternatives. And aren't you the same guy who gets crazed
    | when anything is changed affecting backwards compatibility? :-)

    As to alternative tests, none are simple, and esp. none are as easy to
    read as a DEFINED test.
    This change would not affect backward compatibility, because all
    existing code would continue to work without change.
    --
    Steve
  23. Steve Fabian

    Steve Fabian

    Messages:
    3,531
    | ---Quote---
    || Unfortunately @execarray does not provide that information, nor
    || does it load the uninitialized elements with the equivalent of
    || **EOC** as @CLIP[] reports or with **EOF** as @fileread[] does.
    | ---End Quote---
    | Two corrections -- first, you can get the size info with @ARRAYINFO,
    | and second, there *are* no uninitialized elements. @EXECARRAY
    | allocates only as much as it needs, and initializes everything.

    "You must define the array before running @EXECARRAY." Doesn't this mean
    that I must first use SETARRAY to create the array into which @EXECARRAY
    loads data, and @ARRAYINFO returns the same size before and after
    @EXECARRAY? I appreciate that when the array is too large @EXECARRAY sets
    unused elements to empty strings (destroying leftover data), but it still
    does not provide the information how many lines were processed.
    --
    Steve
  24. Stephen Howe

    Stephen Howe

    Messages:
    127
    I personally think all bets are off when malformed textual files are handed to textual functions. It should be such that Windows/DOS textual files should always have paired CRLF characters and that any have just LF or just CR should be regarded as malformed. Only binary functions should work on these.

    Cheers

    Stephen Howe
  25. Steve Fabian

    Steve Fabian

    Messages:
    3,531
    | I personally think all bets are off when malformed textual files are
    | handed to textual functions. It should be such that Windows/DOS
    | textual files should always have paired CRLF characters and that any
    | have just LF or just CR should be regarded as malformed. Only binary
    | functions should work on these.

    I disagree. When program output is intended to control a character mode
    display device, consecutive CR characters are legitimate. For example, the
    program may display different information at different times on the same
    line, e.g., the percentage of a file already copied by the COPY /G command.
    When such output is redirected from the display device, each device control
    character, such as BS, CR and LF, should be interpreted according to what
    the standards specify they do. Lines on a display device are a matter of
    perception. The interpretation should match the perception. Not all text
    files contain pure text!
    --
    Steve
  26. dcantor

    dcantor

    Messages:
    376
    I respectfully disagree. Functions which WRITE text files should write them correctly, but functions which READ text files should be tolerant. Text files still survive from the antiquated era when character codes drove character-by-character printing devices (like Teletype ASR33), and multiple CR characters were sometimes strung together to cause a delay while the physical carriage actually returned to the left margin. (The NUL character and the DEL character were also used for this purpose.)

    Other operating systems have differing standards of what separates lines (or records) in a text file. Some use CRLF; some use LFCR; some use just CR; some use just LF (and call it a "newline"). I have seen other characters used as line- or record terminators, too. In particular NUL and ETX have been used. It is reasonable that users of TCC might want to be able to read, interpret, and process those files. They're not malformed; they are written to different standards.

    In TTY days of yore, you didn't need CRLFCRLF to produce a blank line; all you needed was CRLFLF, as the carriage was already at the left margin.

    In a perfect world (and I am NOT asking for this, Rex), one would be able to specify the character (or character string) which is considered to be a line terminator (or separator) and writing functions like @FILEWRITE[] would use it; and one would also specify a set of terminator strings that reading functions would recognize as EOL. TCC allowing CR, LF, and CRLF is pretty good, though.

    I would ask that all the TCC functions use the same method of counting and numbering lines, though, and I think that's pretty close to the original request in this thread.

    (I specifically suggest, Rex, that, if you can, you recognize any number of consecutive CRs immediately followed by a single LF (or VT or FF) as ONE terminator.)

Share This Page