[Libre-soc-dev] [OpenPOWER-HDL-Cores] microwatt grows up LCA2021

Tue Feb 9 10:48:38 GMT 2021

---
crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68

On Tue, Feb 9, 2021 at 3:28 AM Paul Mackerras <paulus at ozlabs.org> wrote:
>
> On Mon, Feb 08, 2021 at 10:42:16AM +0000, Luke Kenneth Casson Leighton wrote:
>
> > CSRC NIST.gov's STS.  except do not bother with the Lempel-Zive test
> > because it is flawed.  i explain why in the link betrusted-soc.
>
> I found sts-2_1_2.zip on the NIST web site, along with the paper that
> describes it.  There were some gcc warnings when compiling it, and the
> "Universal Statistical" test always produces p=0 no matter what set of
> numbers I give it, so I suspect gcc has exercised its prerogative to
> break your program if it has any non-compliance with the C standard.

yeah as i mentioned in the previous message, i was running this on
32-bit systems, probably using gcc 3

> When you were using STS, did you have any framework to run it
> automatically and generate summaries, or did you just run it manually
> and look at the finalAnalysisReport.txt file?

i just looked at the finalAnalysisReport.txt

what i did do however was "beowulf cluster" the thing.  i think i
split it down into two programs:

1) that ran the individual tests (produced the output which you see in
some subdirectories)
2) that then analysed that output

in this way what i could do is, if i happened to run 2 batches of
1,000 100k bit-runs i could merge them together (by hand) and run
stage (2) to get better analysis.

> > btw one very important thing, it may be worthwhile to coordinate with
> > Bunnie Huang regarding the FAILURE runs here:
> >
> > https://github.com/betrusted-io/betrusted-wiki/wiki/TRNG-characterization
> >
> > the marsaglia tests in particular have me concerned, they fail twice.
>
> I'm running dieharder again now, and I got:
>
>  marsaglia_tsang_gcd|   0|  10000000|     100|0.09468860|  PASSED
>  marsaglia_tsang_gcd|   0|  10000000|     100|0.72404443|  PASSED
>
> That's just one data point of course, but there doesn't seem to be an
> immediate indication of a problem here.

exactly, and that's the problem: dieharder doesn't do the same type of
test-of-test-of-tests that STS does.

remember: statistically speaking, failures (outliers) *are* going to
occur.  it's the *number* of failures/outliers that diehard and
dieharder *are not telling you about*, but STS does.

basically that marsaglia_tsang_gcd test needs to be:

* ported into STS
* run 1,000 times, indepdendently.
* have the same test-of-test-of-tests histogram analysis run on it
[that diehard/er is NOT doing]

*only then* can you be confident that - after running it not once, not
twice, not three times, but a THOUSAND times, and performing a
RIGOROUS statistical analysis of the results - that it's okay

right now, saying "doesn't seem to be an indicator of a problem"
fundamentally misses this abbbbsolutely critical point that it is the
*test of the tests* that you need to pay attention to.

to get the same type of analysis that is missing from diehard/er, what
you will have to do is:

* run diehard/er MANUALLY 1,000 times
* note the p-values
* go look up some mathematical papers on statistical analysis
* MANUALLY write your own histogram testing program analysing the 1,000 p-values

*only* then will you have achieved on-par confidence testing that STS provides.

l.