WAD @RANDOM

May 20, 2008
9,349
62
Syracuse, NY, USA
I don't think this should produce negative numbers. It doesn't in v20 and before.
Code:
v:\> do i=1 to 5 ( echo %@random[0,2147483647] )
1448389178
-817923826
-272901703
1035707379
-1074066855
 
May 20, 2008
9,349
62
Syracuse, NY, USA
Is that a real use case?
Someone might have a use for it. But, really, it's all screwed up. Compare it to v20.

Where does the randomness come from ... MAPM? If so, it's a FP number in (0,1), call it x, and this sort of calculation should turn it into an integer >= lo and <= hi. And, for 32-bit numbers, the limits in the help should be OK.

floor(lo + x * ( hi - lo + 1)) assuming floor works like @EVAL's floor, namely, floor(-.5) = -1
 
May 20, 2008
9,349
62
Syracuse, NY, USA
V20 does this without complaint.
Code:
set x=%@random[-2147483647,2147483647]
I've run a few simple tests for randomness on it and it seems to do it correctly.
 
May 20, 2008
9,349
62
Syracuse, NY, USA
This is better in that the limits from the help seem OK. But there's a big problem with the randomness of it. I'd expect random uniform (each value equally likely). But it's coming out triangular (and very nicely so, as if by design) instead of uniform. Here are pics of the distributions of 10000 instances of %@random[-2147483647,2147483647] for v20 and v24.
1545154308344.png
1545154349211.png
 
May 20, 2008
9,349
62
Syracuse, NY, USA
I see in the help that you changed @RANDOM (significantly) so that it now can produce 64-bit numbers. Internally it may be OK, but the output isn't. It always outputs a number in the INT range. This one shouldn't be in the INT range.
Code:
p:\4utils\release> echo %@random[%@eval[2**32], %@eval[2**63-2]]
1881353011
This one always gives a negative result (in the INT range).
Code:
p:\4utils\release> echo %@random[8999999999999999000,9000000000000000000]
-494666116
And here (below) while the range meets the stated criterion, there's a bad parameter message.
Code:
p:\4utils\release> echo %@random[%@eval[2**32], %@eval[2**64-3]]
TCC: (Sys) The parameter is incorrect.
 "%@random[4294967296, 18446744073709551613]"
 

rconn

Administrator
Staff member
May 14, 2008
10,988
98
I see in the help that you changed @RANDOM (significantly) so that it now can produce 64-bit numbers. Internally it may be OK, but the output isn't. It always outputs a number in the INT range. This one shouldn't be in the INT range.
Code:
p:\4utils\release> echo %@random[%@eval[2**32], %@eval[2**63-2]]
1881353011
Works here with build 29.

This one always gives a negative result (in the INT range).
Code:
p:\4utils\release> echo %@random[8999999999999999000,9000000000000000000]
-494666116
Works here with build 29.

And here (below) while the range meets the stated criterion, there's a bad parameter message.
Code:
p:\4utils\release> echo %@random[%@eval[2**32], %@eval[2**64-3]]
TCC: (Sys) The parameter is incorrect.
"%@random[4294967296, 18446744073709551613]"
WAD - the start and end arguments are 64-bit integers, and your end argument is negative (and < start, and therefore invalid).
 
May 20, 2008
9,349
62
Syracuse, NY, USA
Yup! The first two are OK. As for the third, I only read the bit about the range in the help (and my range was OK) ... rather dumb of me since I should have figured you were using unsigned __int64 and I know well what values they can have.

What's your random number generator? Is it the 64-bit version of the Mersenne Twister? Though it's old, I just discovered it and implemented it in a plugin ... a tad faster that the 32-bit one (in only a few quick tests) regardless of whether both are built for x86 or both are built for x64.
 
May 20, 2008
9,349
62
Syracuse, NY, USA
Hmmm! I hate to sound like a mathematician, but there's still a problem. I know what it is (had to deal with it myself) and how to work around it. That involves waste (throwing away some random numbers, can't be helped if you want uniformity) resulting in 1.5 generations each time instead of 1 (in this very bad case). That's a very minor problem since calling genrand64_int64() takes only about 1 microsecond (which is nothing compared to the rest of the code needed to implement @RANDOM).

This will happen whenever (hi - lo + 1) is not a power of 2, but can go unnoticed when the range is small. Here's a pretty bad case. I asked for a sample of size 10,000 like this:

Code:
(do i=1 to 10000 (echo %@random[-6148914691236517205,6148914691236517205])) > clip:
Here's a histogram of the results showing that a negative result is twice as likely as a positive one (even though the range is symmetric about 0). It's not surprising when you dig into what's happening. I'll post a work-around when I get to the other computer (and an explanation of what's happening if anyone wants one).

1545364862732.png

You don't need a histogram to see this ... just do a few and count the negative ones. Here (below) I got 66 negative ones out of 100.

Code:
v:\> do i=1 to 100 (echo %@random[-6148914691236517205,6148914691236517205]) | ffind /v /t"-" /u
  66 lines in      1 file
 
May 20, 2008
9,349
62
Syracuse, NY, USA
Here's my version showing the workaround. To get it to be uniform, you have to throw away some bad guys before modding by (hi - lo + 1).
Code:
INT WINAPI f_TEST ( LPWSTR psz )
{
    LONGLONG llLo = 0, llHi = 0;  // inclusive, limited to plus/minus (_I64_MAX-1) = 9223372036854775806

    Sscanf(psz, L" %I64d , %I64d ", &llLo, &llHi);

    if ( llHi < llLo )
    {
        TCError(ERROR_INVALID_PARAMETER, psz);
        return -1;
    }

    ULONGLONG x, span = llHi - llLo + 1, limit = span*(0xFFFFFFFFFFFFFFFF/span);

    do x = genrand64_int64(); while ( x >= limit );

    // take your pick
    //_i64tow_s(llLo + (x % span), psz, 32, 10);
    Sprintf(psz, L"%I64d", llLo + ( x % span ));

    return 0;
}
A very, very slightly better workaround would be to use 2^64 where I used 0xFFFFFFFFFFFFFFFF, but that's hard to do on my computer! :-)

Here's a time comparison. I've done many. Mine's a little faster but you're probably doing more error checking. It also shows mine giving about 50:50 negative:positive and @RANDOM giving about 67:33.

Code:
v:\> timer & do i=1 to 10000 (echo %@random[-6148914691236517205,6148914691236517205]) | ffind /v /t"-"
/u & timer
Timer 1 on: 23:57:03
  6624 lines in      1 file
Timer 1 off: 23:57:06  Elapsed: 0:00:02.65

v:\> timer & do i=1 to 10000 (echo %@test[-6148914691236517205,6148914691236517205]) | ffind /v /t"-" /u
& timer
Timer 1 on: 23:58:07
  5019 lines in      1 file
Timer 1 off: 23:58:10  Elapsed: 0:00:02.53
 
Last edited:
May 20, 2008
9,349
62
Syracuse, NY, USA
A very, very slightly better workaround would be to use 2^64 where I used 0xFFFFFFFFFFFFFFFF, but that's hard to do on my computer! :-)
I found a way around that and added another validity check on the parameters. A little change in the logic kept it fast.
Code:
v:\> timer /q & do i=1 to 10000 (noop %@random[0,1]) & timer
Timer 1 off: 12:06:33  Elapsed: 0:00:02.21

v:\> timer /q & do i=1 to 10000 (noop %@test[0,1]) & timer
Timer 1 off: 12:06:40  Elapsed: 0:00:02.23
There are also no restrictions on the limits except that they be 64 bit signed integers (and lo <= hi). This allows the following extreme case where the limits are exactly the limits of a 64-bit signed integer.
Code:
v:\> echo %@test[%@eval[-(2**63)],%@eval[2**63-1]]
-6901783667727303969
Code:
INT WINAPI f_TEST ( LPWSTR psz )
{
    LONGLONG llLo = 0, llHi = 0;  // inclusive range, no restrictions

    if ( Sscanf(psz, L" %I64d , %I64d ", &llLo, &llHi) != 2 || llHi < llLo )
    {
        TCError(ERROR_INVALID_PARAMETER, psz);
        return -1;
    }

    ULONGLONG x, span = llHi - llLo + 1, limit;

    // span == 0 means span == 2^64 (which is OK)
    // the second disjunct below is equivalent to span being a power of 2
    // in either case there are no bad guys to throw away

    if ( span == 0 || (limit = span*(0xFFFFFFFFFFFFFFFF/span)) + span == 0 )
        
        x = genrand64_int64();

    else    // else there will be bad guys at the top; disallow them

        { do x = genrand64_int64(); while ( x >= limit ); }
    

    //_i64tow_s(llLo + (span == 0 ? x : (x % span)), psz, 32, 10);
    Sprintf(psz, L"%I64d", llLo + (span == 0 ? x : (x % span)));

    return 0;
}
 
May 20, 2008
9,349
62
Syracuse, NY, USA
Notes:
1. For what it's worth ...
2. I barely know what I'm doing when it comes to std:anything (so any tips would be appreciated)

You can also get at the Mersenne Twister like this (below). The problem I've discussed (and offered a fix for) seems to have been handled by Microsoft. Here's what I did in my 4UTILS plugin. It's fast and doesn't seem to eat up much space.

In 4UTILS.H
Code:
#include <random>
extern std::default_random_engine g;
In 4UTILS.CPP
Code:
// global scope
std::default_random_engine g; // will be Mersenne Twister ...
// in InitializePlugin()
g.seed(GetTickCount()); // ... according to g.seed's bubble help
In MRAND.CPP
Code:
#include <random>
INT WINAPI f_TEST ( LPWSTR psz ) // compare to @RANDOM
{
    LONGLONG lo = 0, hi = 0;  // inclusive range, no restrictions

    if ( Sscanf(psz, L" %I64d , %I64d ", &lo, &hi) != 2 || hi < lo )
    {
        TCError(ERROR_INVALID_PARAMETER, psz);
        return -1;
    }

    std::uniform_int_distribution<LONGLONG> m(lo, hi);

    Sprintf(psz, L"%I64d", m(g));

    return 0;
}
 
May 20, 2008
9,349
62
Syracuse, NY, USA
This persists in build 30. Here's another solution. It might be a bit faster than the last one I offered because it postpones computing "limit" until it's known that it's necessary, and it avoids an integer multiplication (though that might have to be done at some level in computing the modulo (%).
Code:
    ULONGLONG x, span = hi - lo + 1, limit, mod;

    // span == 0 means span == 2^64, which is good ... no bad guys.
    // mod == (span - 1) is also good ... no bad guys.

    if ( span == 0 || (mod = (0xFFFFFFFFFFFFFFFF % span)) == (span - 1) )
        
        x = genrand64_int64();

    else    // else disregard the bad guys at the top
    {
        limit = 0xFFFFFFFFFFFFFFFF - mod;
        do x = genrand64_int64(); while ( x >= limit );
    }

    //_i64tow_s(lo + (span == 0 ? x : (x % span)), psz, 32, 10);
    Sprintf(psz, L"%I64d", lo + (span == 0 ? x : (x % span)));
 
May 20, 2008
9,349
62
Syracuse, NY, USA
The distribution of values of @RANDOM[lo,hi] is only uniform (every number equally likely) when (hi - lo +1) is a power of 2. That's because you can only divide up 0 ~ UINT64_MAX into intervals of 0 ~ n-1 (using module, %) EVENLY when n is a power of 2. Here's a histogram of the distribution of values of @random[-6148914691236517205,6148914691236517205] which is an almost-worst case. Though the interval is symmetric about 0, the negative numbers are twice as likely than the positive numbers. It's easy to correct and with a correction I can't measure a time difference even in this almost-worst case. The most recent correction (my last post) is the best I have to offer. MS's inplementation of the Mersenne Twister has apparently corrected for this unavoidable effect. I also suggested a way to get that from std ... random.

1546664021564.png
 

rconn

Administrator
Staff member
May 14, 2008
10,988
98
The distribution of values of @RANDOM[lo,hi] is only uniform (every number equally likely) when (hi - lo +1) is a power of 2. That's because you can only divide up 0 ~ UINT64_MAX into intervals of 0 ~ n-1 (using module, %) EVENLY when n is a power of 2. Here's a histogram of the distribution of values of @random[-6148914691236517205,6148914691236517205] which is an almost-worst case.
Your example @random[-6148914691236517205,6148914691236517205] has invalid arguments - the args are signed int 64's, and the max range is 0x7FFFFFFFFFFFFFFE. So you're specifying a negative range.

When I use a range < INT64_MAX-1 I do not see a distribution issue. For example:

Code:
do i=1 to 100 (echo %@random[-4611686018427387903,4611686018427387903]) | ffind /v /t"-" /u/code]

returns evenly distributed numbers.  Do you have a real-world need for random numbers > -4 quintillion and < 4 quintillion?
 
May 20, 2008
9,349
62
Syracuse, NY, USA
Your example @random[-6148914691236517205,] has invalid arguments - the args are signed int 64's, and the max range is 0x7FFFFFFE. So you're specifying a negative range.
That's DWORD_MAX. Don't you mean 0x7FFFFFFFFFFFFFFE? The signed 64-bit integers range from -(2^63) to (2^63-1). That's -9223372036854775808 to 9223372036854775807.

The numbers I used in my example are well within that range (in fact somewhat carefully chosen to be about 2/3 of the limits).
 

rconn

Administrator
Staff member
May 14, 2008
10,988
98
The numbers you used in your example are within that range, but the *difference* between them is definitely not. @RANDOM does a (high - low) + 1, and with your example that puts it in the negative range for INT64.

The signed 64-bit integers range from -(2^63) to (2^63-1). That's -9223372036854775808 to 9223372036854775807.
Yes, and you can do @RANDOM[-9223372036854775807,0] or @RANDOM[0,9223372036854775807]. But you cannot do @RANDOM[-9223372036854775808,9223372036854775807]. (Nor, I suspect, does anybody actually *need* to.)
 
May 20, 2008
9,349
62
Syracuse, NY, USA
I don't know what you're doing, or why, and I don't want to spend more time trying to figure it out. There is no need for (hi - lo + 1) to be in the (signed) INT64 range. It can be any number in the unsigned INT64 range. In fact it can even be 2^64 (looks more like 0). That puts no limits on lo and hi except that they be in the signed INT 64 range (and lo <= hi). Both my code and MS's std\random allow that, have corrected for non-uniformity, and are as fast as @RANDOM.
 
May 20, 2008
9,349
62
Syracuse, NY, USA
Sound like you have working code, Vince. Why not publish a plugin to replace @RANDOM?
There's already @MRAND in 4UTILS. The public version still uses the 32-bit Mersenne Twister. At home it has been updated to produce any 64-bit unsigned integer (0 ~ 2^64-1); subtract 2^63 for 64-bit signed numbers ( -(2^63) ~ 2^63-1). I haven't published it yet.
 
May 20, 2008
9,349
62
Syracuse, NY, USA
std::rand doesn't support negative numbers (which you apparently want since you keep trying to use them!).
I don't know if we're talking about the same thing, but this certainly handles negative numbers, in fact all pairs of LONGLONGs with lo <= hi. And it uses the Mersenne Twister and doesn't need a uniformity correction.

Code:
std::uniform_int_distribution<LONGLONG> m(lo, hi);
I thought the help was more up-to-date but it seems to have reverted to something inappropriate.

1546720358269.png
 
May 20, 2008
9,349
62
Syracuse, NY, USA
I'm not plugging std::random. I tried converting all my 4UTILS random stuff to it. That added 10KB to the DLL and in one case seemed to be buggy. Using my own MT64 is just as fast so I went back to it.