[Libre-soc-dev] [RFC] horizontal SVP64 vectors

Thu Jul 8 13:22:32 BST 2021

Hi Luke,

On 2021-07-08 13:18, Luke Kenneth Casson Leighton wrote:
> On 7/8/21, Richard Wilbur <richard.wilbur at gmail.com> wrote:
> 
>> So by “Horizontal Vectorisation” are you referring to running a list 
>> of
>> instructions on particular vector elements (the inside of the 
>> innermost loop
>> in Cooley-Tukey for example) then moving to the next vector elements
>> (possibly determined by a REMAP and some SHAPE registers) and 
>> repeating?
> 
> yes, exactly.
> 
> more later,
> 
> first, jacob, i thought overnight about what you said, and basically
> for elwidth overrides the backend gets hit by a stack of 8 bit element
> 0 operations then a batch of el1 then el2 and yes, to sort that out
> buffering is needed.
> 
> however that's an implementor's problem not an API problem, that
> allows different companies to compete on performance.

It's funny because it is another way of looking at

https://hackaday.io/project/8774-f-cpu/log/187267-f-cpu-as-a-decent-vector-processor
In my case, I simply "stick" the instructions of each of the separate 
pipelines during a hardware loop,
so they keep operating the same opcode, 4 separately and in parallel, 
BUT
by writing to destination registers in another pipeline, I let them 
communicate and
stream data from one pipeline to another.
No weird exception handling, no crazy scheduling involved.
I just need to define a prefix instruction that will manage the loop 
count
and pointer auto-updates. The other cool thing is that the vectors are 
mapped
to registers through the register-mapped memory : the vectors can be ANY 
length
and reside in cache, instead of requiring crazy numbers of registers...

Of course the limitation is that the "vector operations" are limited to 
4 arithmetic
operations in series/parallel/at once but the memory aspect would be 
pretty efficient.

Remember : what would Seymour do ? :-D

In fact I suspect that's close to how he did vector bypass in the 
Cray-1.

"just stick the instructions in place in the buffer" is a pretty simple 
method.

> l.
yg