[Libre-soc-dev] load/store quad and svp64
programmerjake at gmail.com
Tue Apr 12 16:35:57 BST 2022
On Tue, Apr 12, 2022, 03:07 Luke Kenneth Casson Leighton <lkcl at lkcl.net>
> additionally, wishbone is simply not capable of handling greater than
> data buses, so we would be forced to implement WB burst-mode right the way
> through the entire codebase down to the DRAM.
no, you aren't...all you need for 128-bit atomicity is to have cpus
coordinate such that one 128-bit atomic is atomic *only relative to other
cpus' 128-bit (and shorter) operations on the same memory block*, not
throughout all hardware and dram too. OpenPower specifically only requires
atomics to work on plain cached memory, not cache-inhibited and i/o memory.
so, basically all you need is the cache coherency protocol to operate on
>=128-bit cache blocks (even if you transfer blocks between cpus or to/from
dram at any transfer size -- all that matters is you transfer the whole
block before another cpu can read/write *just that block*, it doesn't
matter what happens to any other blocks meanwhile (memory fences handle
that part)) and 128-bit atomics implemented by the local cpu keeping the
cache block in its L1 cache and delaying other cpus' requests to
invalidate/share that cache block while it's reading/writing (that kind of
cache block pinning is needed anyway for any (not just 128-bit)
load-linked/store-conditional loops to be efficient).
> saying "just" implement lq etc is basically about FIVE months of work.
i think that may be overestimating quite a bit...it should be much easier
once we have a working cache-coherency protocol -- which we need anyway for
More information about the Libre-soc-dev