[Libre-soc-dev] [RFC] SVP64 Vertical-First Mode loops

lkcl luke.leighton at gmail.com
Sun Aug 22 12:10:32 BST 2021


On August 21, 2021 10:52:01 PM UTC, Richard Wilbur
<richard.wilbur at gmail.com> wrote:
>On Sat, Aug 21, 2021 at 3:42 PM lkcl <luke.leighton at gmail.com> wrote:
>> On August 21, 2021 9:30:21 PM UTC, Richard Wilbur
><richard.wilbur at gmail.com> wrote:
>> >(The hard result cache needn’t be tied specifically to REMAP, it
>could
>> >be used by normal vector or scalar code.)
>>
>> ya know... another name for "fast small hard result cache" is
>"register file"?
>
>Is it?

yes.

> "fast", yes.  "small", not necessarily.

which would need explaining to the ISA WG, "why we are duplicating the
functionality of a register file including adding explicit instructions which
are to transfer between the new type of register file and the standard GPR/FPR"

also if it is particularly large you run into latency issues.

>
>> everything you described has the identical properties of a register
>file... :)
>
>That's sort of what we want but don't have space in the instruction
>format for the bits to specify the register numbers, right?

correct.  and don't want to (a) modify v3.0B or (b) go retrospectively
back and alter the SVP64 RM field.

>  So I see
>this as an opportunity to create an algorithm-specific method of
>addressing the new "registers".

which in turn requires a means and method of actually accessing
those new registers.

> Another advantage of this scheme is
>that it is never in need of saving and restoring with a context
>switch.

this isn't true: i can foresee circumstances where two proceses will need
to use different constants.

honestly richard although at first glance it seems like a good idea,
it's really no different from "A Register File".

plus, really, a way is needed for *all* instructions to read from
"The Registers/Cache" not just one or two, because if it's just
one ("move from one register/cache to the GPR/FPR") then
that's one extra instruction inside inner loops

and if it's merged into a "specialist" instruction (DCT coefficient
multiply) we just caused what was previously a potentially
useful generic twin mul-add instruction to become a non-generic
one.

all these things need to be thought through - in full - unfortunately,
when it comes to ISA design.  then, when you've spent several
days/weeks outlining the entire lot, you then have to spend several
more days/weeks making a comparative analysis against *existing*
schemes.

part of that analysis involves
* "what's the cost of implementing this" as well as
* "what's the cost to CHANGE an EXISTING implementation" and
* "how much work is it to create a Conformance Validation Test Suite" and
* "what will the ISA WG think about this proposal, what will they ask"

l.



More information about the Libre-soc-dev mailing list