[Libre-soc-isa] [Bug 697] SVP64 Reduce Modes

bugzilla-daemon at libre-soc.org bugzilla-daemon at libre-soc.org
Thu Mar 24 00:21:48 GMT 2022


--- Comment #24 from Luke Kenneth Casson Leighton <lkcl at lkcl.net> ---
(In reply to Jacob Lifshay from comment #23)

> well, your argument for *not* having moves is based off of mostly the same
> things iirc:
> having alus also be able to do moves is *also* only a microarchitectural
> decision.

ALUs never do MVs. ok very basic ones would. usually a pseudoop add rt ra 0

> > 
> > second, bear in mind, that the schedule of ops is entirely deterministic
> > based on having read the predicate mask.  ops may be issued (scheduled)
> > and analysed long before actual execution takes place.
> that's true, it applies to the case with moves too, so isn't a good argument
> to choose either moves or no-moves.

when combined with Vertical First mode it will melt people's brains,
and also complicate ExtraV coherence to have the decision in the
separated logic to suddenly do a 2op MV rather than a 3op or 4op

too much.

> > no MVs.
> having moves allows a much simpler algorithm imho,

probably, but where's the fun in that? :)
also retrospectively i found that DCT REMAP
turned out to use Gray Coding. that was...
unexpected and beautiful, and i would never
have noticed the simplicity of it if i had
thought "lets take the simple route"

> as well as having a
> consistent place where the reduction result goes (element 0, rather than
> whatever random spot the remap algorithm happens to leave it).

i know. i worked through the caveats: only a single element (all others
predicated out) would be the one that remained invalid.

even 2 elements should / would target the correct result-indexed destination
element regardless of the positions of 2 active bits of predicate.

> I can work out the HDL details if you like. Would creating a fsm that
> creates the tree-reduction ops be sufficient for now?

given that this will end up being a type of REMAP
can i recommend following the path i did there with
MMATRIX and FFT DCT which was:

* a braindead obvious python demo algorithm
* conversion to a yield generator which purely
  returns indices that
* get blatted on top of a scalar op then
* integrate the yield generator into ISACaller
* and then implement the HDL FSM

yes it is a lot of work, it is why MM FFT DCT took me
about 8 weeks or more.

ISACaller integration will be essential so might as well
do incremental.

the first priority would therefore be to do the braindead
demo.  i think the first version i did for MATRIX REMAP
didnt even do the mmult itself, just printed out the

also i would like to see what the algorithm generates
(the indices) to see if it is in fact workable.

DCT/FFT REMAP has caveats: power-of-two.

no point spending time doing HDL FSM if the algorithm turns
out to be borked.  need to find that out early.

will find links later

You are receiving this mail because:
You are on the CC list for the bug.

More information about the Libre-SOC-ISA mailing list