1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

SDK - GetLine and redirected stdin

Discussion in 'Support' started by p.f.moore, Jul 5, 2008.

  1. p.f.moore

    Joined:
    May 30, 2008
    Messages:
    122
    Likes Received:
    1
    I'm writing a plugin which needs to read from standard input. The
    GetLine function in the SDK seems appropriate, but I'm having some
    trouble using it.

    I have a callback routine, which gets called in a loop from a library
    API. The callback does

    HANDLE in = GetStdHandle(STD_INPUT_HANDLE);
    ...
    GetLine(in, ...)

    By tracing the function calls, it seems to me that every time the
    GetLine function is called, it starts again from the beginning of
    standard input. This only seems to happen when stdin is redirected
    from a file, like

    plugin /L MyPlugin
    myplugincmd <f

    It does *not* happen in the case

    echo some text | (plugin /L MyPlugin & myplugincmd & plugin /U MyPlugin)

    What's the issue here? And how should I use GetLine to be robust when
    called with a redirected file as input?

    Thanks,
    Paul.
     
  2. thomasl

    Joined:
    Jun 10, 2008
    Messages:
    35
    Likes Received:
    0
    "p.f.moore" <> wrote:

    GetLine() is a can of worms.

    Having said that, I ran into the same problem and found the culprit to
    be a call to QueryIsFileUnicode() which was also in the loop. Move this
    out of the loop and it should work. At least it did so for me.

    --
    cheers thomasl

    web: http://thomaslauer.com/start
     
  3. vefatica

    Joined:
    May 20, 2008
    Messages:
    8,129
    Likes Received:
    33
    On Sat, 05 Jul 2008 10:16:02 -0500, you wrote:


    My GREPP uses either of two utility routines, one uses GetLine() the other uses
    fgetws(). When I switch to the GetLine() routune I see that it doesn't work
    with redirected input. I don't know why. Maybe Rex will chime in. But
    GetLine() is very slow, reading a byte at a time, according to Rex, to
    facilitate pipes. My other, much faster, routine looks like this (without all
    the Oniguruma stuff).

    INT GetEm(HANDLE hFile, WCHAR *pszRegEx, BOOL bCase, BOOL bReverse, BOOL bQuiet)
    {
    // Onig stuff

    INT rc = 0;
    WCHAR buf[8192];
    BOOL bUnicode = QueryIsFileUnicode(hFile);
    INT hCrt = _open_osfhandle((long) hFile, bUnicode ? _O_BINARY : _O_TEXT);
    FILE *hf = _fdopen( hCrt, bUnicode ? "rb" : "r" );

    // Onig stuff
    // bInterrupt may be set by a temporary console ctrl handler

    while ( !bInterrupt && !feof(hf) && fgetws((WCHAR*)buf, 8192, hf) )
    {
    // Onig stuff and set rc
    }

    byebye :
    fclose(hf);
    _close(hCrt);
    return rc;
    }

    This approach does work with redirected input:

    v:\> grepp reset alterping.btm
    :reset
    if "%signal" EQ "r" goto reset

    v:\> grepp reset < alterping.btm
    :reset
    if "%signal" EQ "r" goto reset
     
  4. p.f.moore

    Joined:
    May 30, 2008
    Messages:
    122
    Likes Received:
    1
    2008/7/5 thomasl <>:


    Too right! :-)


    It did indeed. That fixed the problem. Thanks for the suggestion.
    Paul.
     
  5. p.f.moore

    Joined:
    May 30, 2008
    Messages:
    122
    Likes Received:
    1
    2008/7/5 vefatica <>:

    See Thomas' comment - maybe it's related to QueryIsFileUnicode?


    [...]

    Yes, I had an attempt at using ReadFile directly, it was much easier -
    but I had no idea in that case how to support both Unicode and ANSI
    stdin. I may go back to that approach, and try again, as although the
    speed isn't a huge issue here, the code was a lot simpler.

    Thanks for the example.
    Paul.
     
  6. vefatica

    Joined:
    May 20, 2008
    Messages:
    8,129
    Likes Received:
    33
    On Sat, 05 Jul 2008 11:08:23 -0500, you wrote:


    Experiment shows that GetLine()'s nEditFlag should be 0x10000 for a pipe while
    it should match the file in the case of redirected stdin. Rex, how does a
    plugin tell the difference?
     
  7. vefatica

    Joined:
    May 20, 2008
    Messages:
    8,129
    Likes Received:
    33
    On Sat, 05 Jul 2008 11:08:23 -0500, you wrote:


    Thomas, does your routine work in both cases, stdin redirected from a Unicode
    file ... from a non-Unicode file?
     
  8. rconn

    rconn Administrator
    Staff Member

    Joined:
    May 14, 2008
    Messages:
    10,092
    Likes Received:
    85
    vefatica wrote:


    QueryIsPipeHandle().

    Rex Conn
    JP Software
     
  9. p.f.moore

    Joined:
    May 30, 2008
    Messages:
    122
    Likes Received:
    1
    2008/7/5 vefatica <>:

    Is that affected by the UnicodeOutput flag? I can see rather a lot of
    cases to consider here:

    * Console input (With or without UnicodeOutput)
    * Redirected file (Either ASCII or Unicode)
    * Pipe input (With or without UnicodeOutput)
    * Here document (In a batch file which can be either Unicode or ASCII)


    Indeed, the key question here is that, given that GetLine needs to be
    passed a flag to describe the encoding (Unicode or ANSI) and the OS
    APIs (ReadFile etc) simply read bytes, how does one derive the correct
    encoding to be used? I suspect that's what QueryIsFileUnicode is
    about, but I suspect that works simply by checking for a BOM - and
    hence it won't work elsewhere.

    I think the rule should be:

    1. If STD_INPUT_HANDLE points at a seekable device, check the start of
    the file for a BOM and work from there.
    2. If STD_INPUT_HANDLE is not seekable, it's either the console or a
    character device. Check using QueryIsConsole, and if it's the console
    go on the basis of UnicodeOutput, otherwise assume ASCII.

    And maybe plugin commands should have an optional encoding flag, to
    override this.

    Questions: (a) Is this reasonable, and (b) how does it tie in with
    what the SDK and/or TCC do at the moment?

    Paul.

    PS I'll do some experiments when I have a spare moment, and report back...
     
  10. vefatica

    Joined:
    May 20, 2008
    Messages:
    8,129
    Likes Received:
    33
    On Sat, 05 Jul 2008 12:17:15 -0500, you wrote:


    That's not in TakeCmd.h or exposed by TakeCmd.dll.
     
  11. rconn

    rconn Administrator
    Staff Member

    Joined:
    May 14, 2008
    Messages:
    10,092
    Likes Received:
    85
    vefatica wrote:

    The entire contents of the function:

    // check to see if the specified handle is connected to a pipe
    DLLExports int QueryIsPipeHandle( HANDLE hFile )
    {
    return ( GetFileType( hFile ) == FILE_TYPE_PIPE );
    }

    Rex Conn
    JP Software
     
  12. p.f.moore

    Joined:
    May 30, 2008
    Messages:
    122
    Likes Received:
    1
    2008/7/5 p.f.moore <>:

    It looks like the following is effective:

    HANDLE in = GetStdHandle(STD_INPUT_HANDLE);
    BOOL uni;

    if (QueryIsConsole(in)) {
    uni = FALSE;
    } else if (GetFileType(in) == FILE_TYPE_PIPE) {
    uni = QueryUnicodeOutput();
    } else {
    uni = QueryIsFileUnicode(in);
    }

    Printf(L"Treat as Unicode: %s\n", uni ? L"Yes" : L"No");

    The only case I'm nervous about is where I unilaterally assume that
    the console is always ANSI. Rex - is this true? Is it impossible for a
    handle for which QueryIsConsole is true, to be Unicode? I certainly
    can't make it happen...

    Paul.
     
  13. rconn

    rconn Administrator
    Staff Member

    Joined:
    May 14, 2008
    Messages:
    10,092
    Likes Received:
    85
    p.f.moore wrote:

    The (unredirected) console in TCC is always Unicode, never ANSI.

    Rex Conn
    JP Software
     
  14. p.f.moore

    Joined:
    May 30, 2008
    Messages:
    122
    Likes Received:
    1
    2008/7/6 rconn <>:

    ??? Surely not. What is this code doing, then?

    DLLExports INT WINAPI test (LPTSTR lpszString)
    {
    HANDLE in = GetStdHandle(STD_INPUT_HANDLE);
    BOOL c = QueryIsConsole(in);
    char buf[10];
    DWORD n;
    int i;

    Printf(L"Stdin is%s a console\n", c ? L"" : L" not");
    ReadFile(in, buf, 5, &n, NULL);
    for (i = 0; i < n; ++i) {
    Printf(L"%c", isprint(buf) ? buf : '.');
    }
    Printf(L"\n");

    for (i = 0; i < n; ++i) {
    Printf(L"%2.2x ", buf);
    if ((i % 16) == 15)
    Printf(L"\n");
    }
    Printf(L"\n");

    return 0;
    }

    Result:


    Stdin is a console
    abcdefg
    abcde
    61 62 63 64 65

    So that to me implies that standard input, the console, is returning
    bytes. I suspect I'm misunderstanding your use of the term "console"
    here, or something else is wrong in what I'm doing. Unnervingly
    enough, the characters which were *not* read by my test command did
    not get used as input to the next command line, but were left and
    picked up by the next execution of the test command.

    With a bit of fiddling around, it looks to me like the input is coming
    in using the console code page (850 on my machine) but is being
    displayed in something else (I can't easily tell what).

    Ultimately, what I want to do is to have a plugin command which reads
    its "standard input" (pipe, console, redirected file, here document,
    whatever) using standard ReadFile, or something equivalent which I can
    use to read an arbitrary block of data in one go (using GetLine to
    read a line at a time is OK for some uses, but not all), and then
    establish what the character encoding of that data is, so that I can
    convert it to Unicode. Some aspects of this are impossible (a
    redirected file could be in any arbitrary encoding) but I'm willing to
    compromise a little (for files, use BOM detection for UTF-16 and
    otherwise assume an 8-bit character set which matches ASCII for
    0-127). But as things stand, I'm struggling even to understand what
    cases I have to address.

    The irony of this is that for my personal use, I'm mostly OK with
    ASCII - it's only really the odd latin-15 character (most notably the
    pound sign £) that hits me.
    Paul.
     
  15. thomasl

    Joined:
    Jun 10, 2008
    Messages:
    35
    Likes Received:
    0
    vefatica <> wrote:

    Hmm, I hope and think it does and my tests seem to support this hope...
    but then again, with GetLine() everything is possible;-). This API has
    surprised me more often than I care to count.

    Have a look into the source for my lua4nt or idle4nt plugin (especially
    function reader(), there it is in all its gory detail):
    http://thomaslauer.com/download/lua4nt01.zip
    http://thomaslauer.com/download/idle4nt01.zip

    --
    cheers thomasl

    web: http://thomaslauer.com/start
     
  16. rconn

    rconn Administrator
    Staff Member

    Joined:
    May 14, 2008
    Messages:
    10,092
    Likes Received:
    85
    p.f.moore wrote:

    Definitely yes -- ALL of the internal APIs (including the console) in XP
    / Vista are Unicode. If you're running an ASCII app, all of the Unicode
    APIs get thunked back & forth.




    You're not directly accessing the console -- you're calling it
    indirectly through the ReadFile API, so it's getting converted to ASCII.

    If you're using a non-Unicode font (not recommended), you'll add
    another layer of confusion (and thunking).

    Rex Conn
    JP Software
     
  17. vefatica

    Joined:
    May 20, 2008
    Messages:
    8,129
    Likes Received:
    33
    On Sun, 06 Jul 2008 09:32:03 -0500, you wrote:


    Using ReadConsole() instead, I see Unicode.

    Why does ReadFile() do that?

    I noticed that QueryIsFileUnicode(GetStdHandle(STD_INPUT_HANDLE)) is FALSE.

    It's a bit confusing.
     
  18. vefatica

    Joined:
    May 20, 2008
    Messages:
    8,129
    Likes Received:
    33
    On Sun, 06 Jul 2008 09:32:03 -0500, you wrote:


    While that may be true, it should be noted that this command

    timer & *dir f:\windows\system32 & timer

    (2283 lines) takes 50% longer when Lucida Console is used than when the same
    size raster font is used (here, 2.7 vs. 1.8 seconds when the end of the console
    screen buffer is not reached, 1.9 vs. 1.3 seconds when started with a full
    console screen buffer).

    The added confusion and thunking seem to speed things up!
     
  19. p.f.moore

    Joined:
    May 30, 2008
    Messages:
    122
    Likes Received:
    1
    2008/7/6 vefatica <>:


    Aargh. I never looked at ReadConsole. I'm not sure I'd even realised
    it existed...


    ReadFile is defined as a bytes-only interface, so it has to encode its
    input. I assume it uses the console code page to do this, so it's
    entirely valid. I suspect if I had a keyboard which could generate
    significant chunks of non-ASCII data (rather than just £, €, ¦ and ¬)
    I might stand more of a chance of understanding what's going on...


    Too right!

    To simplify right down, suppose I have a plugin which wants to read
    from in = GetStdHandle(STD_INPUT_HANDLE). I guess I need to do the
    following:

    1. Test QueryIsConsole(in) [btw, what is the OS API equivalent to this?]
    2. If it's true, use ReadConsole, and I get back wide characters.
    3. If it's false, use ReadFile. I now need to know the encoding.
    4. Check if it's a pipe (GetFileType(in) == FILE_TYPE_PIPE).
    5. If it is, it's UTF-16 (wide characters) if QueryUnicodeOutput() is
    true, else *QUESTION 1*
    6. If it's not a pipe, it's a file and so it's seekable and we can
    check the BOM.
    7. If there's no BOM, we're as stuffed as any other application and we
    should use the system default (CP_ACP?)

    Question 1 - what's the encoding of a pipe when unicode output isn't in force?
    Question 2 - is CP_ACP the correct way of specifying the current
    system codepage?
    Question 3 - is the above correct?

    That's so complicated that there's a question 4 - "do I care?" - but
    I'm going to be conscientious and try to do it right... :-)

    Paul.
     
  20. rconn

    rconn Administrator
    Staff Member

    Joined:
    May 14, 2008
    Messages:
    10,092
    Likes Received:
    85
    vefatica wrote:


    Here, Lucida Console draws in 0.69 seconds vs. 0.78 seconds for Terminal.

    I suspect what you're really measuring is anti-aliasing & ClearType vs.
    doing nothing, not Unicode vs. ASCII. (This is going to be highly
    dependent on how good a video card you have!)

    Rex Conn
    JP Software
     

Share This Page