[Libre-soc-dev] Triangular REMAP schedule

Jacob Lifshay programmerjake at gmail.com
Tue May 31 05:27:08 BST 2022

```On Sat, May 28, 2022, 08:25 lkcl via Libre-soc-dev <
libre-soc-dev at lists.libre-soc.org> wrote:

> without going overboard into ZOLC territory, after jacob and i worked on
> bigint last month i have been thinking how to do the outer loops of Knuth
> algorithms D and M, at least in Vertical-First Mode.
>

just a pure triangle schedule isn't sufficient for Knuth's Algorithm D,
since it has the conditionally-executed fix-up loop.

Knuth's Algorithm M imho isn't a good justification for a triangle schedule
since once you get past around 5 words you should be using Karatsuba
multiplication (or more complex algorithms) instead...it's faster. 5 words
is short enough that just having the outer loop be fully unrolled is
sufficient...it'd be like 10-15 instructions for the whole outer loop.

>
> it would be a fascinating application of SVP64 in that for the first time
> some of SVSTATE would need to be context-switched, i.e. the computation of
> a partial intermediary result within an outer loop would roll SVSTATE
> forward, and would need rolling back before calculating the next stage.
>

imho that just too complex....

>
> for multiply there is a register offset that is in effect (x+y)
> when assuming x is the element from src1 and y from src2
>
> ....0000
> ...1111.
> ..2222..
> .3333...
>
> this should be dead easy to calculate these kinds of offsets: i would like
> to make it more general, and include simple triangular mapping so as not to
> have to add conflictd.
>

conflictd is still necessary...triangular scheduling isn't a replacement.
e.g. if you're trying to vectorize:
for i in 0..4096:
a[i] = a[i + b[i]] # a is in memory, so needs load/store

you need to check for conflicts where some later i + b[i] could match an
earlier i. there's no getting around that, even with cray-style vectors or
SVP64 afaict.

Jacob
```