1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Problem with the "List" command...

Discussion in 'Support' started by mathewsdw, Aug 28, 2012.

  1. mathewsdw

    Joined:
    May 24, 2010
    Messages:
    855
    Likes Received:
    0
    I've got the following batch file:
    Code:
    @Echo Off
    SetLocal
    Set FileName="The Name of A File Containing a WebPage.html"
    Set FileSize=%@FileSize[%FileName]
    Set FH=%@FileOpen[%FileName,r,t]
    Iff %FH == -1 Then
      @EchoErr Unable to open webpage %FileName
      Quit 8
    EndIff
    SetDOS /X-45678
    Set Content=%@FileRead[%FH,%FileSize]
    SetDOS /X+45678
    @Echo >NUL: %@FileClose[%FH]
    SetDOS /X-45678
    @Echo *****************************************************************
    @Echo %Content
    @Echo *****************************************************************
    EndLocal
    Quit 0
    
    When I run it like this:
    Code:
    ReadHTML
    
    I get this:
    Code:
    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
    <html xmlns="http://www.w3.org/1999/xhtml" xmlns:fb="http://www.facebook.com/2008/fbml"  xmlns:og="http://opengraphprotocol.org/schema/">
    <head>
    <title>Iron &amp; Wine Related Musicians </title>
     
     
    ... Lots of html ...
     
    <div style='display: inline-block; vertical-align: middle; height: 20px; width: 90px; padding: 0px; margin-left: 10px;'><g:plusone size="medium" href="http://www.starpulse.com/"></g:plusone></div>
        <div id='footer_bar_close' title='close' onclick='document.getElementById("footer_bar").style.display="none"; document.cookie="footer_bar_disabled=true;domain=.starpulse.com";'>x</div>
    </div>
    </div>
    </body>
    </html>
    
    However, when I run it like this:
    Code:
    ReadHTML |& List
    
    I get this:
    Code:
    
    
    In other words, nothing at all.

    And when I comment out the "SetDOS /X-45678" and use "Echo %@SafeExp[%Content]", TCC crashes (32-bit TCC on a 64-bit machine).

    What is happening here and how do I fix it?

    Oh, as an aside, using a URL works fine for a copy command, for instance, but not at all anywhere in the batch-file language (%@FileSize, %@FileOpen, etc.) This is just the way things are?

    - Dan
     
  2. Charles Dye

    Charles Dye Super Moderator
    Staff Member

    Joined:
    May 20, 2008
    Messages:
    3,307
    Likes Received:
    39
    Piping from your batch file to LIST works for me. You don't have an alias for LIST, by any chance?

    How big is your file? I would expect Bad Things to happen if it's more than about 16,000 characters (= about 32 kilobytes if it's UTF-16).

    Yes; it only works where Rex adds code to make it work, and he's almost certainly documented all of them.
     
  3. mathewsdw

    Joined:
    May 24, 2010
    Messages:
    855
    Likes Received:
    0
    Thank you for your response, Charles, but it's completely off base (although how would you know?).

    A. The list command works fine in other context(s), not just with this particular batch file.

    B. I would never, ever, under any conceivable circumstance whatsoever because it's probably the command I use most often in TCC do an alias for the list command, and, in fact, my aliases that begin with "L" are:
    Code:
    Le*adPath=E:\DOS\Startx LeadPath
    LF*unction=Function %$ | List
    ListD*rives=E:\DOS\ListDrives.btm
    
    So that isn't it.

    C. The file is 23,328 bytes in UTF8, which would be more than what you suggest above, but the raw file lists fine with just the list command alone and I've never encountered this before. (Where is that documented?)

    So the only possibility is C but, frankly, that seems unlikely to me (although, of course, I could be wrong. But, again, where is that documented?)

    As an odd note TCC crashes consistently after displaying the second line of asterisks when run in a 32-bit TCC session with no redirection or piping of any kind (i.e., just the naked command alone). Not terrible because I seldom use 32-bit mode, but odd nonetheless.

    - Dan
     
  4. Charles Dye

    Charles Dye Super Moderator
    Staff Member

    Joined:
    May 20, 2008
    Messages:
    3,307
    Likes Received:
    39
    I was addressing the possible crash in SafeChars; that plugin uses a lot of 16K-character buffers. Dumping a 22K-char file into it might very well gork the plugin. (I don't know why you would want to use @SAFEEXP anyway -- the contents of an HTML file are unlikely to be a valid TCC variable name or function!)
     
  5. mathewsdw

    Joined:
    May 24, 2010
    Messages:
    855
    Likes Received:
    0
    Charles,

    Now I understand what you were talking about! Sorry! Unfortunately, it's also completely irrelevant. That is because the "@SafeExp" was put there after the fact (and not removed when I made my posting because by that time I considered it to be irrelevant and therefore wasn't thinking about it any more) in an attempt to fix the problem and it didn't. The problem existed before the "@SafeExp" was put there as well as after (and the problem still exists).

    - Dan
     
  6. vefatica

    Joined:
    May 20, 2008
    Messages:
    7,962
    Likes Received:
    30
    Dan, your batch file piped to LIST worked OK for me using a 21K Unicode (BuildLog) HTM file. I know nothing of UTF8 so I tried this:

    Code:
    v:\> Set FileName=P:\synergy-1.3.1\gen\debug\buildlog.htm
    
    v:\> echo %@filesize[%filename]
    21262
    
    v:\> echo %@utf8encode[%filename, utf8.htm]
    0
    
    v:\> dir /k /m utf8.htm
    2012-08-28  20:17             134  utf8.htm
    
    v:\> type utf8.htm
    ├┐├╛<
    
    v:\>
    Since the output file was only 134 bytes, I doubt it was a correct conversion of the original. And using TYPE on it resulted in only a handful of characters being printed. So I'll start a thread about @UTF8ENCODE[].
     
  7. vefatica

    Joined:
    May 20, 2008
    Messages:
    7,962
    Likes Received:
    30
    There would seem to be something wrong with @FILEREAD[handle,size] when the file is UTF8 (or perhaps more particularly contains CRCRLFs). It stops after one line even though an ample size parameter was given.
    Code:
    v:\> type leontiev.utf8
    Leontief won the Nobel Committee's Nobel Memorial Prize in Economic
    Sciences in 1973, and three of his doctoral students have also been
    awarded the prize (Paul Samuelson 1970, Robert Solow 1987,
    Vernon L. Smith 2002).
     
    Around 1949, Leontief used the primitive computer systems available
    at the time at Harvard to model data provided by the U.S. Bureau of
    Labor Statistics to divide the U.S. economy into 500 sectors.
    Leontief modeled each sector with a linear equation based on the
    data and used the computer, the Harvard Mark II, to solve the system,
    one of the first significant uses of computers for mathematical modeling.
     
    Input-output was novel and inspired large-scale empirical work; in 2010
    its iterative method was recognized as an early intellectual precursor
    to Google's PageRank.
     
     
    v:\> echo %@filesize[leontiev.utf8]
    841
     
    v:\> set fh=%@fileopen[leontiev.utf8,r,t]
     
    v:\> echo %@fileread[%fh,841]
    Leontief won the Nobel Committee's Nobel Memorial Prize in Economic
     
    v:\>
     
  8. vefatica

    Joined:
    May 20, 2008
    Messages:
    7,962
    Likes Received:
    30
    P.S. Leontiev.utf8 was created from leontiev.txt (ascii) with @UTF8ENCODE. The original did not contain any CRCRLFs.

    And it actually contains, as EOLs, 0x0D0D0A00
     
  9. samintz

    samintz Scott Mintz

    Joined:
    May 20, 2008
    Messages:
    1,190
    Likes Received:
    11
    You can use Notepad to save in different encodings. It's under the "File | Save As..." menu. A simple text file just gets the BOM added to the start of the file. I had a text file that had a 0x92 in it (not sure how it got there) that got UTF-8 encoded as 0xE2 0x80 0x99. So my file grew by 5 bytes (3 byte BOM + 2 add'l. encoding bytes).

    I used a 21349 byte UTF-8 encoded file. And I was able to pipe without issue. To be fair it was plain text and not HTML. And I had setdos /x0.

    I tried with a 5495 byte UTF-8 encoded HTML file and once I did a setdos /x-6 I was able to pipe it to V so I could see it.

    -Scott
     
  10. mathewsdw

    Joined:
    May 24, 2010
    Messages:
    855
    Likes Received:
    0
    Guys, I don't know if this has anything to do with anything, but I misspoke. The file isn't UTF8, it's plain old ASCII (8 bits confused me for a moment). And, as I said above, it has nothing at all to do with "List", it's purely a piping issued related to that batch file with that input (and possibly my system but it is independent of 32- vs. 64-bit). Sadly, the only (not completely reasonable!) alternative I can think of at the moment is a C++ program, and I've really been trying to get out of the C++ "habit" lately. I honestly don't know if doing what I want to do is worthwhile doing in C++ because it'll be a fairly substantive program (it has to read web pages off the web, which seems to be non-trivial in C++ after looking into it, whereas it's relatively trivial in TCC because a URL can be used as the source in a copy command). If I don't get an answer in a day or two, I'll either do it in C++ (sigh!) or give up on it altogether because I can live without it (sigh!!).

    - Dan
     
  11. Charles Dye

    Charles Dye Super Moderator
    Staff Member

    Joined:
    May 20, 2008
    Messages:
    3,307
    Likes Received:
    39
    I don't understand what you're trying to do: slurp the entire file into an environment variable, and then dump it with ECHO? Why? What's wrong with ye olde TYPE command?

    At any rate, I think that if you want anyone else to be able to replicate the issue, I suggest you zip up the HTML file in question and make it available somehow, say as an attachment in this forum.
     
  12. samintz

    samintz Scott Mintz

    Joined:
    May 20, 2008
    Messages:
    1,190
    Likes Received:
    11
    Dan,

    Can you post the file as an attachment here?

    -Scott
     
  13. mathewsdw

    Joined:
    May 24, 2010
    Messages:
    855
    Likes Received:
    0
    And guys, while I thought I had already done this, since I don't see it anywhere in this thread (?) I'll do it again. Attached to this posting is a zip file containing the actual HTML that I am trying to parse.

    Also, it dawned on me that the primary thing I need to do is just write a C++ program to parse the HTML file and extract the data I'm looking for. Still some work, but not nearly as much as doing the whole thing in C++.

    - Dan

    P. S. This website is not allowing me to upload the .zip file ("The following error occurred: A server error occurred. Please try again later.") I'll try again much later tonight or early tomorrow morning.
     
  14. mathewsdw

    Joined:
    May 24, 2010
    Messages:
    855
    Likes Received:
    0
    And Charles, the reason you don't understand what the batch file is trying to do is because the one I'm working with at the moment (the one that just tried to verify that the entire HTML file was contained in an environment variable) isn't doing anything even close to what the final batch file is intended to do: Parse the HTML of possibly many different web pages to extract some specific data and then correlate that data between the different web pages. The batch file we've been talking about is just an early step along the way: verifying that the entire HTML file is actually contained in the environment variable from where it can be parsed (using @Index, primarily). However, being able to dump data along the way is an (essential! because I'm such a screw-up at this stage in my life) step in being able to verify that the batch-file-so-far is doing what I intend it to do at any given point.

    - Dan
     
  15. vefatica

    Joined:
    May 20, 2008
    Messages:
    7,962
    Likes Received:
    30
    Lotsa luck! What happens when you run into a file that's bigger than 32,767 bytes? Such a file won't fit in an environment variable.
     
  16. mathewsdw

    Joined:
    May 24, 2010
    Messages:
    855
    Likes Received:
    0
    Vince,

    That could certainly be a valid concern which I'd already thought about, but doesn't seem to be. The largest such file I've found so far is about 28K, and, since there's very little size variation (the smallest I've seen so far is about 27K), it's doubtful that that will ever happen. And even if it does, assuming that the file is truncated at 32K, it won't matter at all because what I'm looking for is no more than about halfway into each file. And lastly, 100% accuracy, while certainly preferable, is not really needed in this application, 90% would pretty much be good enough.

    But thank you for thinking about it.

    And I still can't upload the file ("A server error has occurred.). But if anyone's curious/interested (which I tend to doubt ;)), you can get the page in question from http://www.starpulse.com/Music/Iron_&_Wine/MusicRelations/.

    - Dan
     
  17. mathewsdw

    Joined:
    May 24, 2010
    Messages:
    855
    Likes Received:
    0
    Well, guys, it's become completely academic as of this point because I have, in fact, written and tested a C++ program to do the parsing of the web page(s). At this point it would seem that this would only be of interest to Rex because it's clearly an unexpected piping error as far as I can tell. But thanks to all of you who contributed to this thread! :)

    - Dan
     
  18. vefatica

    Joined:
    May 20, 2008
    Messages:
    7,962
    Likes Received:
    30
    Your batch file (plus piping to LIST) works fine here with that file (28,328 bytes as downloaded and also 28,341 bytes (with BOM) as UTF8-saved with notepad).
     
  19. Charles Dye

    Charles Dye Super Moderator
    Staff Member

    Joined:
    May 20, 2008
    Messages:
    3,307
    Likes Received:
    39
    I couldn't reproduce it either.
     
  20. mathewsdw

    Joined:
    May 24, 2010
    Messages:
    855
    Likes Received:
    0
    Well, thank you guys (I think!;)). You've just proved that there is something wrong with my system. (64-bit Windows on a new computer as delivered from the manufacturer with virtually no "customization" of any kind, not even very many non-mainstream non-Microsoft apps). But again, I've done it in C++ (which works) so it's now irrelevant other than wasting mine and several other people's time. Sorry!

    - Dan
     

Share This Page