[libre-riscv-dev] GPU design
Luke Kenneth Casson Leighton
lkcl at lkcl.net
Tue Dec 4 01:37:15 GMT 2018
so attached is an illustration of an augmented reorder buffer for a
the idea is, after SV register-redirection, there is a phase which
checks to see if 8-bit, 16-bit or 32-bit elements are still
contiguous. if so, then instead of dropping multiple entries into the
reorder buffer, *one* entry is dropped into the reorder buffer, with a
note indicating that it is a "SIMD" operation.
remember, also, that with an instruction queue, predicated operations
may be *completely dropped* - or, if zeroing is requested, the
predicated-out arithmetic operations may be replaced with a "set
element to zero" instruction. so this is *not* quite the same thing.
so, what the byte-mask does is effectively subdivide the registers
down further into smaller sections (down to the byte level). a
"standard" 64-bit operation will set a byte mask of 0b11111111 (see
all three entries ROB1-ROB3 in the attached file are all acceptable,
even though they still use x1, because *none of the byte masks
this will result in these operations going to special SIMD-capable
ALUs. the byte mask is still carried around, even after the operation
is completed. when it comes to actually storing the result in the
register file, the byte mask is very simply passed in to the register
file, leaving masked-out bytes alone.
that's it. that's all there is to it.
now, i *believe* that the same scheme could actually be used to reduce
the complexity of the CAM lookup. as in: i *believe* it is possible
to say double the byte-mask to e.g. 16 bits, and have it cover *pairs*
of registers. i am however not sure of the benefits of doing so: it's
just a thought.
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 24475 bytes
Desc: not available
More information about the libre-riscv-dev