[Libre-soc-dev] microwatt grows up LCA2021
Luke Kenneth Casson Leighton
lkcl at lkcl.net
Wed Feb 10 01:13:43 GMT 2021
On Tuesday, February 9, 2021, Paul Mackerras <paulus at ozlabs.org> wrote:
>> exactly, and that's the problem: dieharder doesn't do the same type of
>> test-of-test-of-tests that STS does.
>
> Actually, it does, and in fact better than STS. STS looks at the
> p-value for each individual test and determines pass/fail, then
> compares the number of passes with a threshold which is determined
> from an alpha value. In contrast, dieharder computes the p-value for
> each individual test and then does a Kolmogorov–Smirnov test on the
> set of 100 or 1000 (or more) p-values, and computes a p-value for the
> result of the KS test.
ahh goooood. that's fantastic to hear.
> The way STS does it loses information at the
> point where the individual p-values are converted to pass/fail, and it
> doesn't do the conversion of the number of passes to a p-value for
> you.
interesting insight. of all the people i could have this kind of
discussion with (very few) it does not surprise me at all that you would
investigate thoroughly and so quickly, to find things i had missed.
>> remember: statistically speaking, failures (outliers) *are* going to
>> occur. it's the *number* of failures/outliers that diehard and
>> dieharder *are not telling you about*, but STS does.
>
> So, the number of passes is a statistic, i.e. a random variate, which
> has its own distribution, so in fact outliers of that value are going
> to occur too.
now you point this out, i concur.
i overcame this to some extent by increasing the number of runs.
>> basically that marsaglia_tsang_gcd test needs to be:
>>
>> * ported into STS
>> * run 1,000 times, indepdendently.
>> * have the same test-of-test-of-tests histogram analysis run on it
>> [that diehard/er is NOT doing]
>
> That statistic above was from 100 runs, and the p-value is the overall
> p-value of the KS test on the 100 individual results.
only 100 runs makes me slightly nervous: i was only able to detect a flaw
in the STS Lempel Ziv test by running 1,000 and 10,000 runs. the LZ test
produced sampling artefacts due to having a correct continuous mathematical
model but too coarse granularity on the actual results. when sampled into
10 histograms, more values fell into bucket 0.5 than should have.
> The KS test is
> effectively a fine-grained histogram test, so your statement about
> dieharder is not correct.
i am delighted that you were able to investigate thoroughly and determine
this.
>
> Looking again at the tests Bunnie Huang did, they were done with only
> 536MiB of data, which is not nearly enough for a full dieharder run
> [....]
> give the cumulative number of rewinds, not the number since the last
> message). It's actually a good sign that the test failed, given that
> its input was effectively the same sequence repeated 16 times.
interesting. i will try to relay this to bunnie.
>> *only* then will you have achieved on-par confidence testing that STS
provides.
>
> Fortunately dieharder actually already does all that for me. :)
that is really good news :)
l.
--
---
crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68
More information about the Libre-soc-dev
mailing list