[Libre-soc-isa] [Bug 1071] add parallel prefix sum remap mode

bugzilla-daemon at libre-soc.org bugzilla-daemon at libre-soc.org
Mon Jan 1 03:11:38 GMT 2024


https://bugs.libre-soc.org/show_bug.cgi?id=1071

--- Comment #28 from Jacob Lifshay <programmerjake at gmail.com> ---
(In reply to Luke Kenneth Casson Leighton from comment #26)
> the consequences are too damaging on the hardware, making it impractical.

I have read what you wrote and I disagree.
> 
> you are
> > forcing parallel reduction to be much less powerful than RVV (RVV supports
> > dynamic reduction sizes),
> 
> i don't care what RVV does, here. they do not have precise interrupt
> guarantees as a hard requirement for a start.

if we choose to have a dynamic svshape instruction, it shouldn't affect precise
interrupt guarantees, since either the svshape instruction is finished or it's
not and the outputs aren't yet written.

the hard part is the SV loop afterwards, which will have precise interrupts
regardless of if the svshape registers are set from a svshape instruction's
immediates, from a dynamic svshape instruction, or from a mtspr.

> 
> > since computing the right values of VL from the
> > length is complex enough that it really shouldn't be done by software (it
> > would be maybe 10-20 instructions), 
> 
> then you should explain and illustrate that instead of saying
> "it MUST be dynamic" with no justification or explanation, given that
> you have no idea of how to properly think through the consequences.

the current algorithm:
https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=openpower/isa/simplev.mdwn;h=33a02e6612065f290d840e15a596dfc2177de5e5;hb=fa603a1e9f2259d86acf4e9451937a000d099289#l309

step <- 0b0000001
i <- 0b0000000
do while step <u itercount
    newstep <- step[1:6] || 0b0
    j[0:6] <- 0b0000000
    do while (j+step <u itercount)
        j <- j + newstep
        i <- i + 1
    step <- newstep
# VL in Parallel-Reduce is the number of operations
vlen[0:6] <- i

now, obviously that's not very suited to hardware in that form, so we refactor
it:
replace inner loop with expression:
i <- i + (itercount + step - 1) / newstep

replace step and newstep with shifting (since they're always powers of 2):
shift <- 0
i <- 0b0000000
do while (1 << shift) <u itercount
    i <- i + (itercount + (1 << shift) - 1) >> (shift + 1)
    shift <- shift + 1
# VL in Parallel-Reduce is the number of operations
vlen[0:6] <- i

assume itercount is in [0,127], change loop to fixed number of iterations (all
extra iterations will add 0, so are no-ops):
i <- 0b0000000
do shift = 0 to 6
    i <- i + (itercount + (1 << shift) - 1) >> (shift + 1)
# VL in Parallel-Reduce is the number of operations
vlen[0:6] <- i

unroll:
i <- itercount >> 1
i <- i + (itercount + 1) >> 2
i <- i + (itercount + 3) >> 3
i <- i + (itercount + 7) >> 4
i <- i + (itercount + 15) >> 5
i <- i + (itercount + 31) >> 6
i <- i + (itercount + 63) >> 7
# VL in Parallel-Reduce is the number of operations
vlen[0:6] <- i

now, we finally have something suitable for HW!

in software, this would be 12 adds, and 7 shifts, so 19 instructions. in HW,
using carry-save adders for the final sum, this could likely be done in 1
cycle, 2 at the very most.

> svshape has FIVE register hazards as outputs already. the absolute
> last thing we need is to make that SIX, one of which is a GPR as input.
> the consequences would be catastrophic on the Hazard Mamagement.

3 points:

1. for prefix-sum/reduction, it is sufficient to read from VL, so that makes
hazard-tracking that much simpler since it's one possible register, and not
128.

2. I see no reason why all SVSTATE registers can't be hazard-tracked as one big
register (since they're almost always used together anyway), which reduces the
number of register hazards for dynamic svshape to 2-in (VL and MAXVL) 3-out (VL
and MAXVL and the SVSTATE* register).

3. as you mentioned while I was writing this, SVSTATE, VL, and MAXVL are
similar to PC, so, assuming we're talking about a big OoO core, they don't need
a huge hazard-tracking matrix, instead there's basically a branch, cancelling
all following speculative instructions and then re-issuing them with the
correct state.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


More information about the Libre-SOC-ISA mailing list