Speed question

vefatica · Aug 26, 2010

I'm testing a pair of plugin functions. The command VARNAMES outputs the names of the environment variables, one per line. The variable function @VARNAMES[] returns an =-separated list of the environment variable names. The two use nearly identical code, the only difference being

Code:

if ( !bUseRegex || (rx = regex_match(szRegex, szVarName)) > 0 )
    Printf(L"%s\r\n", szVarName);

for VARNAMES, vs.

Code:

if ( !bUseRegex || (rx = regex_match(szRegex, szVarName)) > 0 )
    psz += Sprintf(psz, L"%s=", szVarName);

for @VARNAMES.

I'd expect them to be nearly equally fast. But they're not even close (see below). Each will accept a regular expression on which to match variable names but below, none is specified, so regex_match() is not called.

Code:

v:\> timer & for /l %i in (1,1,10000) (varnames > nul) & timer
Timer 1 on: 19:51:00
Timer 1 off: 19:51:14  Elapsed: 0:00:13.14

v:\> timer & for /l %i in (1,1,10000) (echo %@varnames[] > nul) & timer
Timer 1 on: 19:51:15
Timer 1 off: 19:51:18  Elapsed: 0:00:02.94

Rex, can you guess what accounts for the huge difference in times? I have no complaints; I'm just very curious.

rconn · Aug 26, 2010

> Rex, can you guess what accounts for the huge difference in times? I
> have no complaints; I'm just very curious.

It's always a lot faster to display one big string rather than a bunch of
little ones. (STDOUT is not buffered, and you've got a lot more API
overhead.)

Rex Conn
JP Software

vefatica · Aug 26, 2010

rconn said:
It's always a lot faster to display one big string rather than a bunch of
little ones. (STDOUT is not buffered, and you've got a lot more API
overhead.)

Thanks. I hadn't thought of that (and if I had, probably wouldn't have realized it was significant). But a little test exemplifies it:

Code:

v:\> timer & for /l %i in (1,1,5000) ((echo a & echo b & echo c & echo d) > NUL) & timer
Timer 1 on: 22:20:48
Timer 1 off: 22:20:52  Elapsed: 0:00:03.86

v:\> timer & for /l %i in (1,1,5000) (echo a^r^nb^r^nc^r^nd > NUL) & timer
Timer 1 on: 22:20:58
Timer 1 off: 22:20:59  Elapsed: 0:00:01.25

vefatica · Aug 27, 2010

rconn said:
It's always a lot faster to display one big string rather than a bunch of little ones. (STDOUT is not buffered, and you've got a lot more API overhead.)

So I added simple buffering to the command version (VARNAMES).

Code:

if ( bList ) // @VARNAMES
    r += Sprintf(r, L"%s=", szVarName);
else        // VARNAMES command
{
    r += Sprintf(r, L"%s\r\n", szVarName);
    if ( r-psz >= 1024 )
    {
        Printf(L"%s", psz);
        *psz = 0;
        r = psz;
    }
}

The command went from about 1/4 the speed of the variable function to **twice** the speed. That (above) is pretty much the difference between the two (except for the ECHO needed to output the variable function version).

So now I must ask if there's that much more overhead involved in calling the variable function compared to calling the command.m They're both under .0005 seconds so I'm not complaining.

I also noticed that I can "ECHO @VARNAMES[]" when there's over 24000 characters in the string (varname1=varname2=...). What are the limits these days?

rconn · Aug 27, 2010

> So now I must ask if there's that much more overhead involved in
> calling the variable function compared to calling the command.m They're
> both under .0005 seconds so I'm not complaining.

Yes, there's that much more overhead in the (very, very complex) variable
substitution.

> I also noticed that I can "ECHO @VARNAMES[]" when there's over 24000
> characters in the string (varname1=varname2=...). What are the limits
> these days?

32K for an input line, 64K expanded.

Rex Conn
JP Software

vefatica · Aug 28, 2010

On Fri, 27 Aug 2010 22:17:43 -0400, rconn <>
wrote:

|---Quote---
|> I also noticed that I can "ECHO @VARNAMES[]" when there's over 24000
|> characters in the string (varname1=varname2=...). What are the limits
|> these days?
|---End Quote---
|32K for an input line, 64K expanded.

I see. And the size of any one token in the expanded line doesn't
seem to matter. You might up the 8191 limit on @REPEAT, then.

rconn · Aug 28, 2010

> I see. And the size of any one token in the expanded line doesn't
> seem to matter. You might up the 8191 limit on @REPEAT, then.

Not really feasible in 32-bit Windows, as it increases the stack usage
dramatically.

Rex Conn
JP Software

Steve Fabian · Aug 28, 2010

| Code:
| ---------
| if ( bList ) // @VARNAMES
| r += Sprintf(r, L"%s=", szVarName);
| else // VARNAMES command
| {
| r += Sprintf(r, L"%s\r\n", szVarName);
| if ( r-psz >= 1024 )
| {
| Printf(L"%s", psz);
| *psz = 0;
| r = psz;
| }
| }
| ---------

Since you are so interested in speed, did you consider other speed-up
methods, e.g., eliminating all the format-string parsing and
pseudoformatting done by Sprintf and instead using memcpy, strcpy, or their
Unicode equivalent to build your output string, and to precalculate the
upper limit of your buffer so you don't need to add 1024 each time? BTW, you
WILL overflow a 1024-element buffer if the total is more than 1024 elements,
because you test AFTER appending; resetting the first element of the buffer
to NUL is not neeeded, either - the next szVarName will overwrite it,
anyway.
--
Steve

vefatica · Aug 28, 2010

On Sat, 28 Aug 2010 12:25:33 -0400, Steve Fábián
<> wrote:

|| Code:
|| ---------
|| if ( bList ) // @VARNAMES
|| r += Sprintf(r, L"%s=", szVarName);
|| else // VARNAMES command
|| {
|| r += Sprintf(r, L"%s\r\n", szVarName);
|| if ( r-psz >= 1024 )
|| {
|| Printf(L"%s", psz);
|| *psz = 0;
|| r = psz;
|| }
|| }
|| ---------
|
| Since you are so interested in speed, did you consider other speed-up
|methods, e.g., eliminating all the format-string parsing and
|pseudoformatting done by Sprintf and instead using memcpy, strcpy, or their
|Unicode equivalent to build your output string, and to precalculate the
|upper limit of your buffer so you don't need to add 1024 each time? BTW, you
|WILL overflow a 1024-element buffer if the total is more than 1024 elements,
|because you test AFTER appending; resetting the first element of the buffer
|to NUL is not neeeded, either - the next szVarName will overwrite it,
|anyway.

The buffer (psz) is far bigger than 1024; it's the buffer passed to my
function (as in "INT WINAPI VARNAMES (WCHAR *psz));

I suspect Sprintf() is quite fast; after seeing "%s" it knows it's
copying a nul-terminated string and probably does something akin to
the code below (which is what memcpy, wcscpy, et c. would do). With
the method you suggested, I'd have to (separately) get the length of
szVarName to properly increment r. So instead of Sprintf, I tried
this (probably about as fast as it'll get, and adding 32 bytes to the
".text" segment and making anyone reading the code have to think a
little more).

Code:

WCHAR *z = szVarName;
while ( *z )
	*r++ = *z++;
*r++ = L'\r';
*r++ = L'\n';
*r = 0;

The difference:

Code:

v:\> timer & for /l %i in (1,1,10000) (varnames > nul) & timer
Timer 1 on: 13:33:57
Timer 1 off: 13:34:00  Elapsed: 0:00:02.80

v:\> timer & for /l %i in (1,1,10000) (varnames > nul) & timer
Timer 1 on: 13:38:06
Timer 1 off: 13:38:09  Elapsed: 0:00:02.80

That's why sprintf, strcpy, memcpy (and friends and relatives) exist
... to prevent continual re-invention of the wheel. I think the
overhead in such functions is small compared to the job they do.

Search

Welcome!

Speed question

vefatica

rconn

Administrator

vefatica

vefatica

rconn

Administrator

vefatica

rconn

Administrator

Steve Fabian

vefatica

Similar threads