[Libre-soc-dev] microwatt grows up LCA2021

Wed Feb 10 01:13:43 GMT 2021

On Tuesday, February 9, 2021, Paul Mackerras <paulus at ozlabs.org> wrote:

>> exactly, and that's the problem: dieharder doesn't do the same type of
>> test-of-test-of-tests that STS does.
>
> Actually, it does, and in fact better than STS.  STS looks at the
> p-value for each individual test and determines pass/fail, then
> compares the number of passes with a threshold which is determined
> from an alpha value.  In contrast, dieharder computes the p-value for
> each individual test and then does a Kolmogorov–Smirnov test on the
> set of 100 or 1000 (or more) p-values, and computes a p-value for the
> result of the KS test.

ahh goooood.  that's fantastic to hear.

>  The way STS does it loses information at the
> point where the individual p-values are converted to pass/fail, and it
> doesn't do the conversion of the number of passes to a p-value for
> you.

interesting insight.  of all the people i could have this kind of
discussion with (very few) it does not surprise me at all that you would
investigate thoroughly and so quickly, to find things i had missed.

>> remember: statistically speaking, failures (outliers) *are* going to
>> occur.  it's the *number* of failures/outliers that diehard and
>> dieharder *are not telling you about*, but STS does.
>
> So, the number of passes is a statistic, i.e. a random variate, which
> has its own distribution, so in fact outliers of that value are going
> to occur too.

now you point this out, i concur.

i overcame this to some extent by increasing the number of runs.

>> basically that marsaglia_tsang_gcd test needs to be:
>>
>> * ported into STS
>> * run 1,000 times, indepdendently.
>> * have the same test-of-test-of-tests histogram analysis run on it
>> [that diehard/er is NOT doing]
>
> That statistic above was from 100 runs, and the p-value is the overall
> p-value of the KS test on the 100 individual results.

only 100 runs makes me slightly nervous: i was only able to detect a flaw
in the STS Lempel Ziv test by running 1,000 and 10,000 runs.  the LZ test
produced sampling artefacts due to having a correct continuous mathematical
model but too coarse granularity on the actual results.  when sampled into
10 histograms, more values fell into bucket 0.5 than should have.

>  The KS test is
> effectively a fine-grained histogram test, so your statement about
> dieharder is not correct.

i am delighted that you were able to investigate thoroughly and determine
this.

>
> Looking again at the tests Bunnie Huang did, they were done with only
> 536MiB of data, which is not nearly enough for a full dieharder run
> [....]
> give the cumulative number of rewinds, not the number since the last
> message).  It's actually a good sign that the test failed, given that
> its input was effectively the same sequence repeated 16 times.

interesting.  i will try to relay this to bunnie.

>> *only* then will you have achieved on-par confidence testing that STS
provides.
>
> Fortunately dieharder actually already does all that for me. :)

that is really good news :)

l.

-- 
---
crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68