[Libre-soc-dev] SV whitepaper

Luke Kenneth Casson Leighton lkcl at lkcl.net
Mon May 9 10:29:53 BST 2022


---
crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68

On Mon, May 9, 2022 at 4:57 AM Jacob Lifshay <programmerjake at gmail.com> wrote:
>
> On Sat, May 7, 2022, 12:58 lkcl <luke.leighton at gmail.com> wrote:
> > by having Extra-V-like coordinated pushing of the Texture Instructions
> > *down to the memory* there would be no need for a L1 Texture Cache *at
> > all*.
>
>
> that's not actually true, many 3d algorithms read/write to a section of an
> image (too big for cpu regs, but fits in a cache and/or tile buffer) many
> times before finally sending the results to ram. caches are still needed
> for the algorithms that computation in ram without caches are poorly suited
> to.

hm, ok, well, the idea is that the PEs would be running at around the
direct speed of RAM (around 150 mhz) or if faster they have their own
L1 cache, and yes i assumed that *all* texturisation algorithms of Vulkan 3D
would be paralleliseable in a fashion that ultimately meant that the main
CPU would itself not need a separate L1 Texture Cache.

> an example of an algorithm better suited to having a cache is gaussian blur
> (or other large 2d image convolutions) -- each texel is read
> hundreds/thousands of times per output pixel...having reads run at the
> 150Mhz rate of ram cells rather than the >>500Mhz rate of a cache makes it
> run much slower when there isn't a cache even if you have zillions of cores
> in ram.

(a) if the PEs are running at 150 mhz and there is no latency on reading
     localised DRAMs a L1 cache at the PE is pointless.

(b) if the algorithms' memory access can be subdivided regularly into tiles
     that fit into localised memory *already*... you see where i'm going with
     that?

honestly though i am guessing, here.  if there's use-cases for a L1 Texture
Cache at the CPU, there has to be a L1 Texture Cache at the CPU.

> ZOLC Deterministic Schedules would send coordinated OpenCAPI requests down
> > to the actual Memory ICs, and give them a fragment of code to execute that
> > would *also* include OpenCAPI requests for data that the *Memory* IC needed.
> >
> > thoughts appreciated because this is an absolutely mental concept that
> > starts to go waaay down the rabbithole.
> >
>
> it's a really neat idea, just don't create a design that is only good at
> in-memory computation with main cpus that are too slow since you focused
> all your effort on the in-memory cpus...

the fascinating thing is that there's the potential if that happens
to simply not use the PEs and have the exact same algorithm run
on the main CPU.

one method: the CPU uses OpenCAPI requests *to itself*.
(or, the OpenCAPI Memory Requests are triggered transparently
 only if needed)

i'm still thinking all of this through, hence the request for review.

l.



More information about the Libre-soc-dev mailing list