[libre-riscv-dev] GPU design
Luke Kenneth Casson Leighton
lkcl at lkcl.net
Tue Dec 4 02:50:57 GMT 2018
the other interesting possible augmentation: lane-swapping using
let's say that it's known that there are 4 16-bit operands needed for
a SIMD operation, yet the data is distributed across register banks.
the number of reservation stations can be increased from the usual "2"
(src1, src2) to 8 (src1-0:3, src2-0:3).
also, it is not unreasonable to have the result split out into target
registers as well.
in fact, i believe it would be possible to use micro-coding of
xBitManip ALU operations to do the lane-swapping, both on inputs *and*
previously i had not suggested this idea (of using xBitManip to do
byte and word shuffling) because of the difficulty associated with the
pipeline lengths: varying the length of the pipeline phases is a
really, really bad idea.
however if the length of the ALU operation (overall) is no longer a
factor (because of the reordering), then there is no longer a problem.
More information about the libre-riscv-dev