[Libre-soc-dev] [RFC] svp64 "source zeroing" makes no sense

Wed Mar 24 02:09:04 GMT 2021

On Tue, Mar 23, 2021 at 1:49 PM Luke Kenneth Casson Leighton
<lkcl at lkcl.net> wrote:
>
> On Tuesday, March 23, 2021, Richard Wilbur <richard.wilbur at gmail.com> wrote:
> >
> > What "predicate mask bits" will the "Dynamic Partitioned ALUs" be
> > receiving?
>
>
> one option is: absolutely none whatsoever.  you may not have realised the
> significance of "bypass the ALU entirely".

I think I have a pretty good handle on "bypass the ALU entirely" as
being similar to "just store 0 in the destination".  I asked the above
question because of something you said in an E-mail message in this
thread which I received on 21 Mar (you sent on 22 Mar),
"bear in mind two things.

1) Dynamic Partitioned ALUs will require receiving MULTIPLE predicate mask
bits i.e. cover multiple src/dest steps.

2) Multi-issue will require multiple src/dest steps per clock."

I attempted to directly ask about thing #1.  In an intervening message
I had asked if the Dynamic Partitioned ALUs would be receiving source
and destination predication masks, to which you replied, "neither."
This was a follow-up question to try and clarify things.

> conceptually however for 8 16 and 32 bit the bytelevel write-enable lines
> are "expanded predicate mask bits only not really".  anything passed in to
> the ALUs in positions which are not going to be written we simply don't
> give a damn.
>
> when clock gating becomes possible this is an entirely different matter.
> each dynamic lane *at the byte level* will need gating.

So, it sounds like the source and destination predication masks are
important to the issuer in determining which parts of the source
vector to read and process and which parts of the destination vector
to write.  The byte-level write-enable lines look like they have more
to do with how the SIMD ALUs are partitioned and store their results.

> > > a masked equivalent would be handy.
> >
> > What do you mean?
>
> start from a position other than the start.  basically shift the value
> down, trash N bits, then count.

Latest revision has that as well.  That is what is required to start
where we left off after returning from an interrupt.

> > Where in the loop is the valid exit point if an interrupt occurs?
>
> at any time.  it's a Sub-Program-Counter and should be treated as such.

I don't see the "Sub-Program-Counter" in the SVSTATE documentation.  I
see the srcstep, dststep, and svstep.  Do we always finish an issue in
progress?  In other words, after we update srcstep, if we get an
interrupt (hardware) before we update dststep, do we jump out of the
loop before we update dststep?  If this is how it works, this could be
difficult to restart at that particular spot.  Whereas, if we jump out
of the loop at the bottom, after issuing the instruction and
incrementing srcstep and dststep.