[Libre-soc-dev] [OpenPOWER-HDL-Cores] microwatt grows up LCA2021

Wed Feb 10 00:32:00 GMT 2021

On Tue, Feb 09, 2021 at 10:28:03AM +0000, Luke Kenneth Casson Leighton wrote:
> On Tuesday, February 9, 2021, Paul Mackerras <paulus at ozlabs.org> wrote:
> > On Mon, Feb 08, 2021 at 04:50:50PM +0000, Luke Kenneth Casson Leighton
> wrote:
> >
> >> happy to take a look and help review.
> >
> > Here are the finalAnalysisReport.txt results of two STS runs, each
> > on 1,000 sequences of 1,000,000 bits.
> 
> also remember
> * 1,000 on 100k then
> * 10,000 on 100k then if those pass
> * jump to 10,000 of 1e6
> 
> basically crank it up one order of magnitude at a time [but remember that
> the Lempel Ziv test only shows up as "flawed" at these higher numbers of
> runs].
> 
> 
> >  I left out the universal
> > statistical test because it is giving p=0 on any input.
> 
> drat.  probably because of some 64 bit thing.  i was running STS perfectly
> fine on QTY AMD Opteron systems.
> 
> >
> > First run:
> >
> >
> > 109  96 115  87 100 102 101  90  95 105  0.693142    979/1000 *
> NonOverlappingTemplate
> >  98  98 116 106  94  74 103  87 108 116  0.088226    989/1000
> NonOverlappingTemplate
> >  97  92  94 103  98 109 112  94 114  87  0.587274    990/1000
> 
> > - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> - - - -
> > The minimum pass rate for each statistical test with the exception of the
> > random excursion (variant) test is approximately = 980 for a
> > sample size = 1000 binary sequences.
> 
> 
> Ok so notice the asterisk? that indicates a failure.  remember, some
> quantity of "failures" (tests that statistically are borderline) are going
> to occur...
> 
> ... but STS if you look at the paper they mathematically *calculate and
> predict the number of expected failures*
> 
> (a variant of the test of tests of tests thing)
> 
> for that run it has fallen below the acceptable *quantity* of failures to
> be considered "safe"

As I explain in my other email, the number of failures is itself a
random variable with an expected binomial distribution (which they
approximate as gaussian).  The threshold that STS computes for the
number of passes is 3 standard deviations below the average number,
meaning that there is a 0.13% chance of the number of passes being
below the threshold when the null hypothesis (that the generator is
truly random) is true.

Given that STS reports results for 187 test scenarios, one would
expect on average 0.24 of the scenarios to show more than the
expected number of failures.  In other words one expects a "failure"
about once in 4 runs, regardless of how many individual sequences have
been tested or how long they are.  In fact if one *doesn't* get any
failures of the test on the number of passes/fails then that itself is
an indication of non-randomess. :)

> thus this run must be considered an indicator of a catastrophic failure in
> the algorithm.
> 
> yes you need to be that draconian.

Well, I agree that consistent failure of any one test, even a slight
failure, would be catastrophic.  But occasional, unrepeatable, random
failures are *expected*, and that's all I'm seeing.

> now, it's only by one (the success rate is 980, the pass rate was 979) so
> it means that the algorithm is very close to being "good".
> 
> now, if you re-run it, at the other sizes / partitions you should keep an
> eye out for that.  if it happens again, particularly at the 10k runs, then
> that's confirmation that there's a serious problem.

There's a 24% chance that some one of the tests will show a "failure",
even at the 10k x 1M level, or at any level.  So a single
re-occurrence doesn't confirm a problem unless it's on the same test
that failed before.

Paul.