[Libre-soc-dev] microwatt grows up LCA2021

Mon Feb 8 16:50:50 GMT 2021

On Monday, February 8, 2021, Paul Mackerras <paulus at ozlabs.org> wrote:

> Thanks for the pointer.  I have downloaded the tests and I'll compile
> them up and run them on microwatt.

STS is quite "old" but is in straight c.

i apologise my modifications which allowed me to run raw tests on a cluster
then recreate the histograms are on offline storage

>> https://github.com/betrusted-io/betrusted-wiki/wiki/TRNG-characterization
>>
>> the marsaglia tests in particular have me concerned, they fail twice.
>
> That's an interesting page.  The thing that stood out for me is that
> he says his ring oscillators run at only 200 - 300 MHz.  The paper I
> was working from showed oscilloscope traces from plumbing some of the
> XOR outputs out to pins of the FPGA, from which it is clear that they
> are oscillating at at least 500 - 1000 Mhz, though the limited output
> pin bandwidth means you don't see all of the transitions.  I think I
> will try plumbing out some bits to pins and looking at them with my
> oscilloscope.

with an ECP5 having 5G SERDES or other FPGA having gigabit capable pins
that would be very interesting to hear how that goes

>
>> the other thing is that, not having studied the dieharder source code, i
do
>> not know if it does tests-of-tests-of-tests.
>
> I don't think it does statistics over all of the tests.

ah.  then you cannot trust it fully.  the test-of-test-of-tests is
explained in the CSRC STS paper.

>> the test-of-test-of-tests is absolutely essential to make sure that even
>> the tests themselves do not have nonuniform (or uniform) suspicious
>> characteristics.
>>
>> example: if two runs produce identical output, the p-values look ok,
right?
>> so what's the problem, surely the p-value giving a PASS is ok, right?
>
> That didn't happen with Microwatt's RNG.  In fact, occasionally a
> result would show up as "weak", but then the same test run again would
> usually give a different, stronger result.

right.

ok.

so, the way p-values work is: you have a mathematical model of the expected
output (some arbitrary but continuous i.e. FP number) and based on the
distribution of those mathematically-modelled numbers (such as a Bell
Curve) you can do "uniform normalisation" on it, to a value from 0-1 and
that is what your p-value is.

each run of say 100,000 "bit tests" you expect an average of 50,000 but you
will get outliers at 45,000.  the Bell Curve model allows you to turn this
into a p-value of say 0.00005 which says that the probability of getting
45,000 i.e. an imbalance of 45,000 zeros and 55,000 ones is one in 200,000
i.e. 0.00005

with me so far? :)

now, clearly 1 in 200,000 is pretty improbable but statistically speaking
it is not completely out of the question.

now let us say that you make 200,000 runs of 1s-0s counting.  if you got
one or two 0.00005 "weak" occurrences, you might go "mmm ok yeah
statistically that's still plausible"

but what happens if you get FIVE THOUSAND
out of those 200,000 runs creating p-values of 0.00005?

now you know that long-term you have a massive (unacceptable) imbalance
where the PRNG generates far too many zeros.

diehard and dieharder *WILL NOT* pick that up.

this is where the histogram of p-values (test-of-tests) and *its* p-value
(test-of-test-of-tests) gives you absolutely VITAL information that allows
you to reject an implementation as catastrophic flawed.

bottom line: a *few* "weak" tests is *occasionally* expected, but it is
simply not good enough to say "i saw some weak tests", it is absolutely
critical to mathematically analyse if the probability of occurrence of the
*number* of weak tests is itself statistically acceptable or unacceptable.

>
> Well, in dieharder's defence, identical output would be what you would
> expect for a PRNG with a given seed value.

true.... this is for reproduceability of testing which is different.  and
you are doing a TRNG (unseedable), also different

my point is: if you test a TRNG, or for a PRNG if a given seed is set and
you run the test to produce 100 gigabytes of sequential output, and every 1
gigabyte you get a repeated identical pattern of 1s and 0s, you have a
problem that diehard/er is simply not designed to pick up... but STS is.

> Thanks for the detailed answer and the pointer to the STS tests.  I'll
> post some results when I get them.

happy to take a look and help review.

l.

-- 
---
crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68