[Libre-soc-dev] [RFC] SVP64 Vertical-First Mode loops

lkcl luke.leighton at gmail.com
Thu Aug 19 14:22:35 BST 2021

On August 19, 2021 2:00:42 AM UTC, Richard Wilbur <richard.wilbur at gmail.com> wrote:

>by my count that adds to
>  2N + (N ln N) memory accesses


>Hence the excess is around (N ln N) for both registers and memory
>accesses.  Still non-trivial overhead.


when strip-mining occurs it is as if you had no L1 cache at all: LD/ST runs 3-5 times slower.  even bigger, you strip-mine L2 as well and that ends up 8-10x slower.

i had an idea here which might help: a L1 cache hint which swaps over to using the next 6 LSBs for cache line lookup:

cache_row_num = MUX(Hint,

however you'd need to do a full cache flush to swap modes, so you'd better be damn sure you want to do that.


More information about the Libre-soc-dev mailing list