[Libre-soc-bugs] [Bug 1116] evaluate, spec, and implement Vector-Immediates in SVP64 Normal

Sat Jun 10 02:45:00 BST 2023

https://bugs.libre-soc.org/show_bug.cgi?id=1116

--- Comment #1 from Luke Kenneth Casson Leighton <lkcl at lkcl.net> ---
(In reply to Luke Kenneth Casson Leighton from
https://bugs.libre-soc.org/show_bug.cgi?id=1092#c18  )

> https://libre-soc.org/openpower/sv/normal/
> 
> | 0-1    |  2  |  3   4  |  description                     |
> | ------ | --- |---------|----------------------------------|
> | 0   0  |   0 |  dz  sz | simple mode                      |
> | 0   0  |   1 |  RG  0  | scalar reduce mode (mapreduce)   |
> | 0   0  |   1 |  /   1  | reserved                         |
> | 1   0  |   N | dz   sz |  sat mode: N=0/1 u/s             |
> | VLi 1  | inv | CR-bit  | Rc=1: ffirst CR sel              |
> | VLi 1  | inv | zz RC1  | Rc=0: ffirst z/nonz              |
> 
> there's room in that (just) for a bit that says
> "immediates are Vectorised".  ok: using mode[4]
> says "immediates are Vectorised".

and given that no immediates are greater than 16-bit, it is
possible to just ignore elwidth overrides here

> that still leaves mode[3] for some sort of decision.

or another mode in future.  best to have mode[3:4]=0b01
and reserve other combinations.

> the neat thing about this is that even sv.addi can load
> an array of immediates.  oris as well.

the *entire pattern* of 5 instructions to load 64-bit immediates
can be Vectorized

    addi rt,0,#nnnn
    addis rt,0,#nnnn
    rldicl rt, 32
    ori rt,0,#nnnn
    oris rt,0,#nnnn

becomes:

    sv.addi/vi rt,0,#nnnn
    ...

for sv.fli/vi (see https://bugs.libre-soc.org/show_bug.cgi?id=1092#c19)
it is a simple matter of inlining multiple instructions.

i would strongly suggest though *not* trying to piss about
with binutils syntax, just have ".long 0xnnnnnnnn" after it.

> as we discussed yesterday it requires an "Unconditional
> Branch" effect, and i'd recommend it be on MAXVL not VL.
> also to round-up to the nearest 4-bytes.

MAXVL allows for dynamic code to *change the number of immediates loaded*
which is extremely important given that this is compile-time static.

> if RM."immediate-mode":
> 
>     NIA = CIA + CEIL(MAXVL * sizeof(immediate), 4)

forgot that of course the 1st immediate is already in the instruction.
and set hardcoded to 16

  if RM.normal."vector-immediate-mode":
     NIA = CIA + CEIL((MAXVL-1) * 16, 4)

i think not having to read elwidth here will be *really* important,
otherwise the Decoder has a hell of a job.

it is going to be tough enough to identify that this is
"Unconditional Branch": not only does the suffix need identifying
(to find out if it is RM.normal) but the "vector-immediate-mode"
itself needs decoding...

... oh and *then* the new PC can be calculated.

to that end this is DEFINITELY something that goes into the
"Upper" Compliancy Levels.

> jacob you mentioned during the meeting that this would
> be "slow" i.e. dependent on Architectural State (SVSTATE),
> if someone modified SVSTATE with mtspr then things get
> slow: this is *already* in the spec.

it's that some implementations will have caches of where SVSTATE was,
but others will not.

-- 
You are receiving this mail because:
You are on the CC list for the bug.