[libre-riscv-dev] IEEE754 FPU

Sat Mar 2 04:12:57 GMT 2019

---
crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68

On Sat, Mar 2, 2019 at 1:14 AM Jacob Lifshay <programmerjake at gmail.com> wrote:
>
> On Fri, Mar 1, 2019, 17:04 Luke Kenneth Casson Leighton <lkcl at lkcl.net>
> wrote:
>
> > Ok so I am making a complete pig's ear of the adder, slowly and painfully
> > transforming it to connect inputs to outputs instead of taking global
> > variables and extracting and dropping the stage data in and out of globals.
> >
> > Each stage's logic I have split out into combinatorial module blocks with
> > input and output, no clock sync at all. This is deliberate so that it
> > becomes possible later to combine phases into a single cycle, reducing the
> > number of "states" and also when doing pipelines reducing the number of
> > pipeline stages.
> >
> > The chain has a bypass on the add and normalisation, due to special cases
> > for zero, NaN and Inf, however the use of global variables made it
> > complicated to split the chain. I suceeded by creating a new state that
> > puts the intermediate z from the add stages into the output, separate from
> > a different z used to store the special cases, these two different stages
> > go into the same output.
> >
> > This separation allowed me to remove at least one global variable.
> >
> > The normalisation phase is currently the biggest hurdle as it is a
> > multi-cycle phase. My next efforts will be to work out how to get the
> > normalisation phase to cycle on an internally protected local temporary
> > variable, only outputting it when the result is ready.
> >
> you could change normalization to a count-leading-zero operation combined
> with a left shift. That would make it take 2 operations instead of a
> variable number.

 this is what i did in the alignment module, however the normalisation
is complicated by the overflow being shifted in/out at the same time.
the ordering is:

 * align mantissas (and shift exponents accordingly)
 * do the add
 * take the lower bits of the add and create the overflow
 * *then* do the normalisation, which includes shifting the *overflow*
bits up/down *as well* (one at a time).

so the left (or right) shift would need to special-case merging the
overflow bits.

i would really like to do the normalisation on the full range of the
bits (3 extra) first, followed by the shift.  however when it comes to
multiply, there are a whopping 50 bits involved, and for 64-bit
multiply it's *108* bits to be shifted.

for a 32-bit mul, the multiply 2nd phase actually takes the bottom
bits and if any of them is "1" sets the "sticky" bit to 1.

so it's a leeetle complicated, and i want to get through this first
hurdle (which worked, this morning).  actually, it's a lot
complicated, i'm constantly having to throw away and back out of
modifications that keep failing.  now that i've got normalisation
working on a multi-cycle basis, i can continue removing dependence on
global variables.

in the meantime, yes, the overflow complication needs to be thought through.

l.