[libre-riscv-dev] [Bug 132] SIMD-like nmigen signal for partitioning

bugzilla-daemon at libre-riscv.org bugzilla-daemon at libre-riscv.org
Wed Aug 14 23:40:33 BST 2019


http://bugs.libre-riscv.org/show_bug.cgi?id=132

--- Comment #19 from Luke Kenneth Casson Leighton <lkcl at lkcl.net> ---
(In reply to Jacob Lifshay from comment #18)
> (In reply to Luke Kenneth Casson Leighton from comment #17)
> > (In reply to Jacob Lifshay from comment #14)
> > > (In reply to Luke Kenneth Casson Leighton from comment #12)
> > > > (In reply to Jacob Lifshay from comment #11)
> > > > 
> > > > > > 
> > > > > > If the code is not doing single cycle results we cannot use it.
> > > > > 
> > > > > yes we can, we just need to tell the pipeline API "this takes 3 stages
> > > > > instead of one, so insert extra registers on the control signals"
> > > > 
> > > > Which still does not take care of cancellation.
> > > 
> > > it's a simple data pipe, if a particular element is canceled, that pipeline
> > > slot will just be empty, just like divpipecore. the control pipeline can
> > > keep track of which elements have valid data and which have been canceled.
> > > 
> > > > 
> > > > The multiplier code will now need to implement cancellation, which is a
> > > > global mask (not a register-propagated signal).
> > > the surrounding control hardware will just set the associated control
> > > signals such that the canceled/unused data elements are ignored.
> > > 
> > 
> > 
> > Which has the knock on ramifications of underutilised hardware (stages that
> > run empty) which either decreases the IPC count or requires more RSs to
> > conpensate.
> 
> it decreases IPC, which is what happens anytime an instruction is canceled,
> the partially completed instruction used (before it was known that it was to
> be canceled) hardware that could have been used to run other instructions
> had it known. 

That is correct... however by leaving the slots empty there is *yet more*
penalty added.

The reason is because the stop mask is an unary representation of the binary
Reservation Station Index.

If the index cannot be cleared because there are three extra clock delays until
it clears out the end of the pipeline, that is three Reservation Stations that
cannot be used.

When there are no Reservation Stations available the ENTIRE instruction issue
engine has to freeze, solid.

So it is real severe consequences.

And is not conforming to a consistent API.

Or hugely complicates the API (for no good reason).

The multiply code needs the same code structure as div_core.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


More information about the libre-riscv-dev mailing list