[Libre-soc-dev] VSX micro-coding

Luke Kenneth Casson Leighton lkcl at lkcl.net
Tue Nov 24 06:14:18 GMT 2020


paul et al,

got an idea based on how the original POWER core worked (micro-coding
of LD/ST to use rldicl).

the algorithm is as follows, which assumes that VSX is on top of the
standard FP regfile

* add 24 hidden int / fp regs to the regfile
* identify VSX instructions and begin a microcoding loop
* first phase issues rldicl shift-and-mask "extraction" ops that
source from the relevant regfile and target src1/src2 of the "hidden"
regs
* src targets are always placed in the LSBs
* second phase issues sign-extending ops on targets (or other alterations)
* third phase issues the actual vector ops, using the hidden
(converted?) src1/2 and outputting to ANOTHER hidden dest batch of
regs.
* fourth and final phase issues a series of "merging" rldicls,
sourcing the hidden results and targetting the portions of the final
dest regfile.

the fascinating bit is that with good scheduling and a bit of care at
the final phase all of those could be pipelined and keep the existing
microwatt engine 100% busy.

for example the final phase instead of doing 4 sequential 32bit rldicl
shift-masks could instead issue them in the order 0 2 1 3.  0 and 2
would target the lower half of 2 different 64 bit destination regs
(without Write Hazards), then the next 2 cycles 1 and 3 would target
the upper halves.

all of this not in the slightest bit involving *any* additional MUX
paths, entirely done by microcoding and a bit of extra BRAM.

the only tricky bit will be at the final phase to ensure that all
interrupts are disabled once that last batch of micro-ops starts.

up until that point (prior phases) all intermediate results could be
discarded: the hidden regs thrown out if an interrupt occurs during
the microcoding, and the computations entirely restarted from scratch
on return.

it's horribly inefficient, even for 4x 32 bit operations it would be
amazing if a 0.04 IPC were achieved, but so what, it would be spec
compliant and get microwatt over the barrier currently presented by
VSX and the erroneous assumptions by ABI implementors "v3.0B ===
POWER9"

l.



More information about the Libre-soc-dev mailing list