[Libre-soc-dev] [OpenPOWER-HDL-Cores] microwatt dcache potential bug (overlap r0 and r1)

Luke Kenneth Casson Leighton lkcl at lkcl.net
Sat Jan 15 11:32:58 GMT 2022


---
crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68

On Sat, Jan 15, 2022 at 7:25 AM Paul Mackerras <paulus at ozlabs.org> wrote:

> On Fri, Jan 14, 2022 at 11:25:02PM +0000, Luke Kenneth Casson Leighton wrote:
> >                         if req.valid = '1' and req.same_tag = '1' and
> >                             ((r1.dcbz = '1' and req.dcbz = '1') or
> >                              (r1.dcbz = '0' and req.op = OP_LOAD_MISS)) and
> >                             r1.store_row = get_row(req.real_addr) then
> >                             r1.full <= '0';
> >
> > and it *overwrites* r1.full back to zero.
>
> Ummm, it shouldn't,

yeah, it's odd :)

oh hang on: in the latest dcache.vhdl there is this:

                        if r1.full = '1' and r1.req.same_tag = '1' and
                            ((r1.dcbz = '1' and req.dcbz = '1') or
r1.req.op = OP_LOAD_MISS) and
                            r1.store_row = get_row(r1.req.real_addr) then

notice how that's testing "if r1.full=1" not "if req.valid"?
that miiight actually achieve the same effect, but i need
to wake up with some coffee first to assess it.

> I see this at line 1561 of dcache.py:
>
>                 (~r1.dcbz & (r1.req.op == Op.OP_LOAD_MISS))) &

easy to recognise as the original, isn't it? :)

oh hang on the latest dcache.vhdl is quite different.
            ((r1.dcbz = '1' and req.dcbz = '1') or
              r1.req.op = OP_LOAD_MISS) and

> Notice you have r1.req.op there whereas the VHDL has req.op.  I think
> that's your bug.  (Similarly line 1560 has r1.req.dcbz not req.dcbz,
> and line 1559 has r1.req.same_tag not req.same_tag.)

yes, i tried that a couple days ago and it resulted in data corruption
much earlier.  after a frustrating day trying to find out why i gave up on
it, i'll come back to it later because i recognise that it's part of reducing
latency.

results of the two workarounds for the two verilators sims currently
running at a mind-numbingly-fast 1,000 instructions per second, they
are both up to here:

[    0.000000] Kernel command line:
[    0.000000] Dentry cache hash table entries: 32768 (order: 6,
262144 bytes, linear)
[    0.000000] Inode-cache hash table entries: 16384 (order: 5, 131072
bytes, linear)
[    0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off

if that continues successfully i'll leave it alone unless i have a good
reason not to [such as latency / gate timing].

i'm estimating them to get to the boot prompt some time in the next 2
days (!) if all goes well. although i may just check that DEC is properly
calibrated.

also it's probably time to start checking into running on FPGAs because
this is just so ridiculously slow that champions of international paint-drying
watching competitions would be screaming and running away.

a proper DMI-dump-and-restore system i think is becoming an
increasingly high priority: the round-trip on debugging, here, is
on an O(N^2) curve if restarting every time from cold boot.

l.



More information about the Libre-soc-dev mailing list