[Libre-soc-bugs] [Bug 782] add galois field bitmanip instructions

Tue Mar 8 10:47:03 GMT 2022

https://bugs.libre-soc.org/show_bug.cgi?id=782

--- Comment #44 from Luke Kenneth Casson Leighton <lkcl at lkcl.net> ---
(In reply to Jacob Lifshay from comment #43)
> (In reply to Luke Kenneth Casson Leighton from comment #42)
> > (In reply to Jacob Lifshay from comment #41)
> > > though, do we really need gfdiv? 
> > 
> > i'm sure i've seen modulo (remainder) used somewhere.
> 
> yes, in defining gf(2^n) in terms of gf(2) polynomials, the polynomial
> remainder is used. that is *not* a gfbrem.

bear in mind, i don't understand the difference, so i am trusting you to
ask and answer this question: why then have several people published
divmodGF2 algorithms in their GF2 libraries?

they must be used for something.

> well, if we're using a pipeline for gfbinv (which we totally want to if we
> want AES to be usably fast without needing 7 zillion gfbinv FSMs),

you've fundamentally misunderstood something about OoO design here which
i will cover in a mailing list post.

> there is no top half. gfbmul returns the entire answer. if we tried to make
> a gfbmulh, it would always return 0 unless the reducing polynomial had a
> degree of  more than 64 (64 is the max in the current design).

by that logic clmulh should not exist.  there is something missing
here which needs to be explained and clarified.

> yeah, because i have no easy way of figuring out if it has changed
> (automatic notifications would be nice!), and i don't like rereading stuff
> i've already read 5 times in the hope that you might have changed it
> meanwhile... git log is nearly useless since nearly everything has no useful
> commit message.

i re-read the same page sometimes 20 to 30 times a day. this is just good
practice because each re-read results in different insights.

i never consider it "beneath" me to re-read something that many times.

>  if it were instead
> several totally separate FSMs for the same throughput, each FSM would need
> the logic for all stages of the algorithm, making the data path much more
> complex and possibly taking more time due to the complexity, reducing max
> clock speed.

it is perfectly possible for a FSM design to itself farm-in and fan-out
to shared pipelines.

> > 
> > [yes a FSM can be done which processes multiple chained combinatorial
> > stages]
> > 
> > RSes need to farm-in and fan-out to pipelines (which have
> > to be BIG muxes), where FSMs can wire directly. each RS has
> > its own FSM, each FSM stores its result in the RS, no
> > massive fan-in/out.
> 
> that's kinda deceptive, since the muxes with the massive fan-out/in are now
> in the scheduler logic and the busses for carrying intermediate register
> values around (i temporarily forgot the name) rather than in the pipeline
> fan-in fan-out.

i'll answer this on-list. the absolute inviolate rule is that you absolutely
cannot have an untracked in-flight computation.  every register involved in
every computation has to be tracked, and every RS with an outstanding
computation has to be frozen until the result hits a regfile.

-- 
You are receiving this mail because:
You are on the CC list for the bug.