[Libre-soc-isa] [Bug 552] single-predication has "splat" capability, needs review

Wed Dec 23 21:34:01 GMT 2020

https://bugs.libre-soc.org/show_bug.cgi?id=552

--- Comment #4 from Jacob Lifshay <programmerjake at gmail.com> ---
(In reply to Luke Kenneth Casson Leighton from comment #3)
> (In reply to Jacob Lifshay from comment #2)
> > Note that splat will be very common in graphics code (I'd randomly guess
> > 10-20% of instructions, though a lot of those can be done by having a scalar
> > source on a vector op), so we will probably want to take the approach where
> > we have the one scalar ALU and just write to multiple destination registers.
> 
> this was the bit which was the "pain".  effectively that's a micro-coded op,
> separating out the actual scalar operation from the "copy-to-multiple".
> 
> which starts to get us into CISC territory as far as implementation is
> concerned.
> 
> let me think it through...
> 
> * result is produced
> * then written to first dest (including CR)
> * then a micro op "copy" splats it out (predicated).  including CR, here
> (arrrg)
> 
> if that is interrupted, it can be resumed at the copy phase as long as you
> can determine that the result was written.
> 
> that's going to be a pig, but it's doable.

wouldn't it work to have the scalar op just have a whole pile of dest regs in
the dependency matrix, and the data path can just use all 4 reg-file write
buses enabled simultaneously, allowing 4 writes per clock cycle? It doesn't
matter if we push the scalar op through the scalar ALU for as many clock cycles
as needed, we don't have to have the scalar alu be used just once.

All I wanted to avoid is the scalar ALU having 1 op per element, taking 4x more
cycles than needed.

-- 
You are receiving this mail because:
You are on the CC list for the bug.