[Libre-soc-bugs] [Bug 755] add grev instruction (OP_GREV)

bugzilla-daemon at libre-soc.org bugzilla-daemon at libre-soc.org
Tue May 17 01:03:43 BST 2022


https://bugs.libre-soc.org/show_bug.cgi?id=755

--- Comment #51 from Jacob Lifshay <programmerjake at gmail.com> ---
(In reply to Luke Kenneth Casson Leighton from comment #50)
> (In reply to Jacob Lifshay from comment #49)
> 
> > latency of full network:
> > 6 muxes through data input and 12 muxes through select input plus wire delay
> > -- will almost certainly need 2 clock cycles.
> 
> perfectly fine. i've made almost everything min 3 stage ALUs because
> of combinatorial paths in MultiCompALU.

imho having everything be 3-stage ALUs is fine for a simple processor, but it's
terrible if you actually want high performance which is what we're aiming for.

The latency (assuming a mux is an and-or-invert gate with 1.5 gate latency and
an inverter on select with 1 gate latency) is 6 * 1.5 + 12 * 2.5 = 39 gates!!!
this is unacceptably slow for a grev imho.


> 
> > this is 3x a grev/gorc
> > combined network. also 3x a rotator network.
> > 
> > gate count of full network:
> > 4 * 6 * 64 = 1536 muxes
> 
> call it 5 gates per mux? 8000 gates. less than a multiplier by 30%.
> for 256 instructions (which is what 2x 4-entry luts gives)
> that's pretty damn good.

gate count isn't an issue. latency is.
> 
> > a grev/gorc combined network can be made by taking a grev network and
> > writing it in terms of the 4-input and-or-invert cells that the muxes would
> > likely be anyway, converting every other layer to or-and-invert, and then
> > enabling both and-gate inputs simultaneously instead of having them be S and
> > ~S.
> 
> which covers only 2 out of the possible 256 instructions of grevlut.

it actually covers more than that, because that's just how you can make a
faster circuit (probably what the riscv designers were thinking of when they
added gorc) that implements both grev and gorc and other instructions too. imho
we should have an instruction based on that circuit instead of grevlut with its
3x latency -- it would still have the control signals based off of the
immediate like grevlut, but would be single-cycle even on a 5ghz cpu, unlike
grevlut.

> actually 2 out of 512 because i added an invert-input bit as well.

adding a layer of xor gates is trivially easy to add to the grev/gorc circuit
and doesn't add much latency -- imho having invert or not shouldn't be a reason
to choose grevlut over the grev/gorc/etc. circuit.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


More information about the libre-soc-bugs mailing list