[Libre-soc-bugs] [Bug 782] add galois field bitmanip instructions

bugzilla-daemon at libre-soc.org bugzilla-daemon at libre-soc.org
Tue Mar 8 06:53:38 GMT 2022


https://bugs.libre-soc.org/show_bug.cgi?id=782

--- Comment #42 from Luke Kenneth Casson Leighton <lkcl at lkcl.net> ---
(In reply to Jacob Lifshay from comment #41)
> though, do we really need gfdiv? 

i'm sure i've seen modulo (remainder) used somewhere.
if producing that, might as well have div.

> it's a lot of extra complexity 

how? https://libre-soc.org/openpower/sv/gf2.py/
that's what... 7 lines?

> In any case, here's the list of binary gf ops I think we should implement:
> 
> * clmul, clmulh (for gf operations that are > 64-bit, and other stuff)

see wiki, clmulr is missing.

> * gfbmul RT, RA, RB

ack. is it worth doing a gfbmulh? looking at
multGF2 it looks like it could return (res >> deg) & mask2
just as easily to give the top half.

> * gfbmuli RT, RA, imm # discard if we don't have enough encoding space

no chance. snowball, hell etc. typically imm
takes an entire major opcode. mulli addi they
have dedicated major opcodes. makes no sense
except for less than degree 16, with gf jumping
all over the place.

if constructing consts then addi and addis can be used.
on presenting to OPF ISA WG they really want a gfbmuli
i see no reason why not, but it does take an entire
major op.

> * gfbmadd RT, RA, RB, RC

ack. remember it has to match madd and fmadd for REMAP
purposes.  this is IMPORTANT.

> * gfbinv rt, ra
>     input 0 gives result 0 even though that is division by zero,
>     that's what's needed for AES.

ok great.

> gfbadd is covered by the existing xor instructions.

ack.

> gfbmaddsubr is not needed, since sub is identical to add, this is just two
> gfbmadd instructions.

ack.

gftwinmuladd is missing.

btw see wiki page. please do read and
refer to it, i get the impression you're not reading it
regularly.

> If that all sounds good, i can create a new tracking bug for binary gf
> instructions and start working on implementing them.

the basics (lowlevel modules) and bugreport, yes.

btw do not consider pipeline designs to be "the peak epitomy
of absolute performance", it is misleading.

multiple simpler FSMs with multiple RSes can be more efficient
and take advantage of early-out and jump-ahead opportunities
that pipelines cannot.  this makes them *less* efficient rather
than more, particularly for long-running ops.

[yes a FSM can be done which processes multiple chained combinatorial
stages]

RSes need to farm-in and fan-out to pipelines (which have
to be BIG muxes), where FSMs can wire directly. each RS has
its own FSM, each FSM stores its result in the RS, no
massive fan-in/out.

bottom line you need to think of the context in which the
pipeline is used.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


More information about the libre-soc-bugs mailing list