[libre-riscv-dev] GPU design

Luke Kenneth Casson Leighton lkcl at lkcl.net
Tue Dec 4 02:50:57 GMT 2018

the other interesting possible augmentation: lane-swapping using
reservation stations.

let's say that it's known that there are 4 16-bit operands needed for
a SIMD operation, yet the data is distributed across register banks.
the number of reservation stations can be increased from the usual "2"
(src1, src2) to 8 (src1-0:3, src2-0:3).

also, it is not unreasonable to have the result split out into target
registers as well.

in fact, i believe it would be possible to use micro-coding of
xBitManip ALU operations to do the lane-swapping, both on inputs *and*

previously i had not suggested this idea (of using xBitManip to do
byte and word shuffling) because of the difficulty associated with the
pipeline lengths: varying the length of the pipeline phases is a
really, really bad idea.

however if the length of the ALU operation (overall) is no longer a
factor (because of the reordering), then there is no longer a problem.



More information about the libre-riscv-dev mailing list