[Libre-soc-dev] [RFC] horizontal SVP64 vectors
whygee at f-cpu.org
whygee at f-cpu.org
Thu Jul 8 13:22:32 BST 2021
On 2021-07-08 13:18, Luke Kenneth Casson Leighton wrote:
> On 7/8/21, Richard Wilbur <richard.wilbur at gmail.com> wrote:
>> So by “Horizontal Vectorisation” are you referring to running a list
>> instructions on particular vector elements (the inside of the
>> innermost loop
>> in Cooley-Tukey for example) then moving to the next vector elements
>> (possibly determined by a REMAP and some SHAPE registers) and
> yes, exactly.
> more later,
> first, jacob, i thought overnight about what you said, and basically
> for elwidth overrides the backend gets hit by a stack of 8 bit element
> 0 operations then a batch of el1 then el2 and yes, to sort that out
> buffering is needed.
> however that's an implementor's problem not an API problem, that
> allows different companies to compete on performance.
It's funny because it is another way of looking at
In my case, I simply "stick" the instructions of each of the separate
pipelines during a hardware loop,
so they keep operating the same opcode, 4 separately and in parallel,
by writing to destination registers in another pipeline, I let them
stream data from one pipeline to another.
No weird exception handling, no crazy scheduling involved.
I just need to define a prefix instruction that will manage the loop
and pointer auto-updates. The other cool thing is that the vectors are
to registers through the register-mapped memory : the vectors can be ANY
and reside in cache, instead of requiring crazy numbers of registers...
Of course the limitation is that the "vector operations" are limited to
operations in series/parallel/at once but the memory aspect would be
Remember : what would Seymour do ? :-D
In fact I suspect that's close to how he did vector bypass in the
"just stick the instructions in place in the buffer" is a pretty simple
More information about the Libre-soc-dev