[Libre-soc-dev] [RFC] horizontal SVP64 vectors

Luke Kenneth Casson Leighton lkcl at lkcl.net
Wed Jul 7 22:24:52 BST 2021

On 7/7/21, Jacob Lifshay <programmerjake at gmail.com> wrote:
> On Wed, Jul 7, 2021, 10:12 Luke Kenneth Casson Leighton <lkcl at lkcl.net>
> wrote:
>> Horizontal Vectorisation is a lot more generic and has far more
>> applications.
> Incrementing PC will be orders of magnitude more complex, since you might
> have to modify the rest of the fetch pipeline to handle it, and you have to
> correctly handle incrementing over 32/64-bit instructions, and dealing with
> the branch predictor, etc., etc., etc.

all of which has to be done anyway.

> I'd suggest limiting it to 2 (or maybe 4 if we're extra crazy) contiguous
> instructions which can be decoded and stored in a temporary buffer,
> trapping before executing any if they all can't be fetched/decoded,

given the simplicity of allowing PC to increment normally and
srcstep/dststep to remain frozen this seems excessively complex and

> then
> replayed out of that buffer instead of re-fetching them. The buffer can
> just be re-loaded if interrupted by something. All branches, isync, and
> other special instructions can just be made to trap.

you may be imagining this to be more complex than it really is.  if it
wasn't for the extra bit needed (and where to put it) i'd have this
done in a couple of hours.

> After the hardware loop finishes, execution must resume after the last
> instruction in the loop, don't finish the loop then start executing
> somewhere in the middle of the loop.

the insight was down to the fact that the PowerDecoder2 is already
including the srcstep calculation in selecting the register src (and
dststep already in dest regs).

therefore, err... just don't increment srcstep/dststep and err...
don't loop, just move on to the next instruction and .. err it's

there are only two niggles:

1) an explicit instruction is needed which increments srcstep/dststep

2) a "state" bit is needed somewhere which sets this mode (SVSTATE may
have to be 33 bits, sigh, or one is borrowed from MSR).

> This honestly sounds like excessive feature creep, but fast FFTs! oh,
> well...

tell me about it...

sigh yeah i wasn't anticipating this kind of extensive and intrusive
augmentation of the original concept, but the first "real" algorithm
(MP3) was where this all stemmed from, and the whole point of the
exercise is to make Video and 3D efficient and simpler.

so we see where this leads.


More information about the Libre-soc-dev mailing list