[Libre-soc-dev] [RFC] svp64 "source zeroing" makes no sense

Sun Mar 21 23:44:26 GMT 2021

On Sunday, March 21, 2021, Richard Wilbur <richard.wilbur at gmail.com> wrote:

>
>
> The optimization is very simple.

it's not going to be as simple as a single bit test inside a loop.  that is
the absolute top priority right now because we have people waiting on the
critical path for working hardware and simulators.

>
> How sparse do you expect these
> predication masks to be?

literally absolutely all and anything.  complete arbitrary all set, one bit
not set right the way down to single bit set, at any point.

all and anything, from 0/1 when VL=1 right the way to all and any possible
permutations 2^64 when VL=64.

trying to optimise for one particular workload of predicate masks is
guaranteed to backfire, basically.

>
> > this will be possible as a choice for individual implementors where it
> > makes sense based on gate count, performance and power consumption for
> > their needs.
> >
> > it will be helpful to record such optimisations for when there is time to
> > implement them.
>
> I'm happy to do that.  It's just so simple that I was thinking it
> sounded like an easy win if we expect predication masks to be fairly
> sparse as you save the cycles every time you perform an instruction

all and any spent on optimisations prevents and prohibits people waiting
from proceeding.

this is a critical path right now and we cannot afford the luxury at this
time.

please do record it so that as i already said, when there is time, it may
be examined, and at that point, further time will be saved because we have
a procedure.

bear in mind that i have been planning this for a long while.  the
predicate masks when element width overrides are implemented will go
directly into the PartitionedSignal as well as into the byte-level
write-enable lines on the register file.

> right now the priority question is "does ORing the src and dest zeroing to
> > put zeros in the output make sense"
>
> With the code, I would suggest:
>
> if dest_zeroing and src_zeroing:
>     dstmask &= srcmask

that's effectively what's happening, yes.

or, destmask = srcmask = (destmask&srcmask)

>
> I guess that's because I don't understand the intent.  To me, source
> zeroing just passes 0's into whatever you were going to do.

yes, that was the old behaviour, which is "nice and logical".  the problem
is, it makes no sense for e.g. LD or ST to try to LD or ST from address 0
when the input parameters have zero-predication, does it? in fact it would
be dangerous to try because it will throw exceptions or worse produce
garbage.

and for divide operations this will cause overflow, garbage, or in FP it
will cause spurious exceptions.

etc etc.

the task of going through "what does it mean for inputs to be zero" on each
and every single operation is a very large one.

worse than that it is necessary to define a procedure for people in the
future.

worse than that it interferes with the logic for reading operands, in ways
that i am not looking forward to implementing.

by contrast skipping the pipeline and inserting a zero into the outputs is
relatively straightforward.

l.

-- 
---
crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68