[libre-riscv-dev] GPU design
lkcl at libre-riscv.org
Fri Dec 7 12:48:39 GMT 2018
On Fri, Dec 7, 2018 at 12:33 PM Jacob Lifshay <programmerjake at gmail.com> wrote:
> On Fri, Dec 7, 2018, 03:37 lkcl <lkcl at libre-riscv.org wrote:
> I think sharing between pairs of cores will still work since with a
> pipelined divider, you can do 1 divide per clock. As some perspective, a
> quad-core haswell using avx instructions can do 2.29 (4 cores * 8 lanes /
> 14 cycles) fp32 divisions per clock and our quad-core GPU with a pipelined
> divider per pair of cores can do 2 divisions per clock.
haswell avx isn't targetted at GPU workloads (but does pretty well at
video decoding), appreciate the insight.
> Note that having the rv base integer and fp registers be part of the same
> register file like I had suggested before allows us to save 2 clock cycles
> with the fast sqrt algorithm since you can use the SV rename table to have
> an integer register and a fp register renamed to the same underlying
> register removing the need to move between int and fp registers.
i think, with ROB#s, MV could hypothetically be implemented as
just... changing the dest target register number (and type, from
int/float). maybe. will need to be thought through properly.
More information about the libre-riscv-dev