[Libre-soc-dev] load/store quad and svp64

Jacob Lifshay programmerjake at gmail.com
Tue Apr 12 17:05:36 BST 2022

On Tue, Apr 12, 2022, 08:52 Luke Kenneth Casson Leighton <lkcl at lkcl.net>

> ---
> crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68
> On Tue, Apr 12, 2022 at 4:36 PM Jacob Lifshay <programmerjake at gmail.com>
> wrote:
> >
> > On Tue, Apr 12, 2022, 03:07 Luke Kenneth Casson Leighton <lkcl at lkcl.net>
> > wrote:
> >
> > > additionally, wishbone is simply not capable of handling greater than
> > > 64-bit
> > > data buses, so we would be forced to implement WB burst-mode right the
> way
> > > through the entire codebase down to the DRAM.
> > >
> >
> > no, you aren't...all you need for 128-bit atomicity is to have cpus
> > coordinate such that one 128-bit atomic is atomic *only relative to other
> > cpus' 128-bit (and shorter) operations on the same memory block*,
> that means cache coherency, which is a pig on its own.

yup, but we need it anyway for efficiency at higher core count and for
efficient interop with host power9 cpus (needed for BMC). once we have it,
lq is pretty easy.

> actually implemented the atomic operations (QTY 1) in the L2 Cache (!)

sometimes RMW atomics are more efficient where the shared memory is rather
than where the cpu is. part of why OpenPower needs them (iirc it doesn't
have any).

and set up a special

truncated sentence?

> > > saying "just" implement lq etc is basically about FIVE months of work.
> > >
> >
> > i think that may be overestimating quite a bit...it should be much easier
> > once we have a working cache-coherency protocol -- which we need anyway
> for
> > multi-core.
> 2-core SMP is almost done in microwatt due to the addition of
> cache "snoop" capability (external cache-line invalidation).
> problem is, it's single-cycle, hence the need for stalling
> (global hardware spin-lock) to prevent one CPU writing
> QTY 2of 64-bit writes to its cache line(s) in 2+ cycles
> whilst another CPU writes to *its* same cache line for
> one of the same 64-bit words.

yeah, it kinda works, but it's kinda a kludge to get around the missing
cache coherency protocol.

> microwatt is write-thru cache hence why i said about needing
> to do the 64-bit writes down through the Wishbone Bus.

well...as long as no other cpu can read/write to the same location during
the atomic op (done using cache coherency or bus locking), the wishbone bus
could be any size you like, even 8-bit, and it all still works -- so, no,
you don't need a 128-bit wishbone bus.


More information about the Libre-soc-dev mailing list