[Libre-soc-dev] [RFC] horizontal SVP64 vectors

Richard Wilbur richard.wilbur at gmail.com
Thu Jul 8 04:27:47 BST 2021


On Jul 7, 2021, at 11:12, Luke Kenneth Casson Leighton <lkcl at lkcl.net> wrote:
[…]
> what i am suggesting is:
> 
> PC += 4
> PC += 4
> ...
> srcstep += 1
> 
> this would allow the same REMAP schedule to be applied to *multiple
> instructions*, then an explicit "svstep" instruction would be called
> at the end of a loop to increment srcstep / dststep, and a branch used
> to jump back to run the instructions for the next src/dststep.
> 
> in this way we can do FFT complex numbers without needing to add them
> as first class types (which i was a bit reticent about).
> 
> Horizontal Vectorisation is a lot more generic and has far more applications.
> 
> thoughts appreciated.

So by “Horizontal Vectorisation” are you referring to running a list of instructions on particular vector elements (the inside of the innermost loop in Cooley-Tukey for example) then moving to the next vector elements (possibly determined by a REMAP and some SHAPE registers) and repeating?

Is this version of “REMAP” what you are proposing to use to implement ZOLC?  Sounds like a cool idea.  Very easily applicable to quite a number of algorithms without the introduction of a lot of new instructions.

This reminds me of a concept I proposed to some classmates back in undergraduate days:  a massively serial processor (as opposed to massively parallel processors).  The basic idea is to have a number of processing elements which you configure as you decode a sequence of instructions and connect up register access/dependencies.  The advantages could include some amount of parallel instruction decode and starting the next instruction execution as soon as its operands are available.  Loops become very efficient if they are already decoded as no new decoder activity is needed if the loop fits in the processor.  On the other hand, procedure calls in the loop body would be a reason to add a second (and maybe more) context so as not to destroy the first decoded loop body in the procedure call.  Possibly even better would be to have the space to accommodate in-lining of procedures (dropping the jump and return instructions).


More information about the Libre-soc-dev mailing list