[libre-riscv-dev] div/mod algorithm written in python

Sun Jul 21 12:06:33 BST 2019

On Sun, Jul 21, 2019, 03:54 Luke Kenneth Casson Leighton <lkcl at lkcl.net>
wrote:

> ---
> crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68
>
> On Sun, Jul 21, 2019 at 11:02 AM Jacob Lifshay <programmerjake at gmail.com>
> wrote:
> >
> > On Sat, Jul 20, 2019 at 12:55 AM Luke Kenneth Casson Leighton
> > <lkcl at lkcl.net> wrote:
> > > yehyeh.  well, the basic routines are all there, already done: there's
> > > pipeline stages already that will shift the mantissa up so that the
> > > MSB is always 1 (and adjust the exponent accordingly as well), and
> > > likewise on the way out.
> > >
> > > so as long as the integer "thing" works, fitting it in place is
> > > actually pretty trivial.
> > >
> > > once the result is generated, the post-normalisation pipeline stages
> > > take care of re-normalisation, so even if the mantissa (int-result)
> > > doesn't have a MSB which is 1, that's *precisely* what
> > > re-normalisation takes care of: shifting the MSB and adjusting the
> > > exponent as well.
> > >
> > > so the exponent will need to be carried through the int-div pipeline
> > > stages *untouched*, ok?  generated/modified by the de-normalisation,
> > > carried through the int-div pipe, handed to the post/re-normalisation,
> > > and dealt with there.
> > One thing we will need to consider is that sqrt/rsqrt actually
> > requires the mantissa to be shifted such that the exponent is even,
>
> yep.  that's easily done.  the class FPMSBHigh can be adapted to
> ensure that happens:
>
> https://git.libre-riscv.org/?p=ieee754fpu.git;a=blob;f=src/ieee754/fpcommon/msbhigh.py;h=3a29935a725f8caf6262cb4a536cad7a712aa683;hb=6352a29b6e1e73ff42a2172f60bfe825d33c3fac
>
> example usage:
>
> https://git.libre-riscv.org/?p=ieee754fpu.git;a=blob;f=src/ieee754/fpmul/align.py;h=2f578e9ed55f05afc4e1a685b622b07e4a2764be;hb=6352a29b6e1e73ff42a2172f60bfe825d33c3fac
>
>
> https://git.libre-riscv.org/?p=ieee754fpu.git;a=blob;f=src/ieee754/fpcommon/postnormalise.py;h=25dca7adff7c2ab5473d6d40c6d29e13dad11a62;hb=6352a29b6e1e73ff42a2172f60bfe825d33c3fac#l107
>
>
> https://git.libre-riscv.org/?p=ieee754fpu.git;a=blob;f=src/ieee754/fcvt/pipeline.py;h=413577995ef9222676a5bae0c4fb4d95d2151697;hb=6352a29b6e1e73ff42a2172f60bfe825d33c3fac#l179
>
>
>
> > otherwise we lose the factor of sqrt(2). I had been thinking that,
> > since all normal/denormal numbers produce normal outputs for
> > sqrt/rsqrt (exponent divides by 2), it would be better to have the
> > exponent handling happen during the same stages that the mantissa is
> > being calculated by DivCore, since that way, we don't need an extra
> > stage just to handle that and it will pipeline better.
>
>  remember: with the pipeline API, the concept of "stages" does *not*
> automatically mean "a clock delay".  what it means is: "a convenient
> way to conceptually separate code into tasks".
>
>  we then have the choice:
>
>  * do we chain the "tasks" together into a single-clock-cycle "thing"?
>  if so, use StageChain
>
>  * are the "tasks" too complex (too high a gate latency), if so, use
> something that's derived from ControlBase to create a
> clock-controllable "actual pipeline stage".
>
>  if the code is *not* separated out, we do not have that choice.  it
> would require a big redesign - a lot more coding effort - should we
> discover, much further down the line, that the gate latency is far too
> high in any one "stage".
>
> so it's basically much more preferable to have modules that do "tasks"
> - one of those tasks would be (just like the align.py code above),
> "make the exponent an even number", and, if you look here:
>
>
> https://git.libre-riscv.org/?p=ieee754fpu.git;a=blob;f=src/ieee754/fpdiv/pipeline.py;h=6fd5a45c3a02ef0d88cceb0920f7d2400bc64f56;hb=6352a29b6e1e73ff42a2172f60bfe825d33c3fac
>
> you'll see that it's *already* assumed, in the "stack", that that's
> exactly what's going to be done (matching all of the other FP code,
> which follows the exact same pattern).
>
> if that pattern is *not* going to be followed, there needs to be a
> really, _really_ good reason, as it will be both confusing and also
> require understanding of two totally disparate codebases that
> effectively do the same job.
>
> remember also that we have quite a lot of "code-morphing" to do
> (replace all use of SimpleHandshake with a "no delay" base class that
> respects "cancellation"), and having different codebases (different
> methods of doing pipelines) will make that task a lot harder to
> complete.
>
k.

> the exponent operations would be (assuming inputs and outputs are
> > biased and bias is positive):
> > fdiv: nexponent - dexponent + bias (needs overflow/underflow handling)

> from what i can gather, there's certain ranges that the mantissa has
> to be placed into, and the result will come out "in the correct
> range".
>
I mean the bias converting between the mathematical exponent and the
unsigned integer stored in the exponent: 15 for f16, 127 for f32, 1023 for
f64, and 16383 for f128.

I'm assuming you didn't mean that we needed a 2048-bit wide mantissa (for
f64) :)

>
>  what i've seen is, for example, in the multiply, extra bit(s) are
> added to the product (1 extra bit per input mantissa).  then it no
> longer becomes necessary to worry about *exponent* biasing, because
> the mantissa has the extra accuracy required.
>
>  that extra accuracy then results in the remainder having a few more
> bits.  do the normalisation, put those extra bits into
> guard/round/sticky, and the job's done:
>
> # p is product (52 - or more! - bits long)
>
>                 mw = self.o.z.m_width
>                 self.o.z.m.eq(p[mw+2:]),
>                 self.o.of.m0.eq(p[mw+2]),
>                 self.o.of.guard.eq(p[mw+1]),
>                 self.o.of.round_bit.eq(p[mw]),
>                 self.o.of.sticky.eq(p[0:mw].bool()) # sticky is all
> the remaining bits
>
> jon dawson's divider code, which passes lots of IEEE754 tests, doesn't
> have any kind of exponent bias.
>
it does, it's in the base fp number class.