[Libre-soc-bugs] [Bug 1116] evaluate, spec, and implement Vector-Immediates in SVP64 Normal
bugzilla-daemon at libre-soc.org
bugzilla-daemon at libre-soc.org
Sat Jun 10 02:45:00 BST 2023
https://bugs.libre-soc.org/show_bug.cgi?id=1116
--- Comment #1 from Luke Kenneth Casson Leighton <lkcl at lkcl.net> ---
(In reply to Luke Kenneth Casson Leighton from
https://bugs.libre-soc.org/show_bug.cgi?id=1092#c18 )
> https://libre-soc.org/openpower/sv/normal/
>
> | 0-1 | 2 | 3 4 | description |
> | ------ | --- |---------|----------------------------------|
> | 0 0 | 0 | dz sz | simple mode |
> | 0 0 | 1 | RG 0 | scalar reduce mode (mapreduce) |
> | 0 0 | 1 | / 1 | reserved |
> | 1 0 | N | dz sz | sat mode: N=0/1 u/s |
> | VLi 1 | inv | CR-bit | Rc=1: ffirst CR sel |
> | VLi 1 | inv | zz RC1 | Rc=0: ffirst z/nonz |
>
> there's room in that (just) for a bit that says
> "immediates are Vectorised". ok: using mode[4]
> says "immediates are Vectorised".
and given that no immediates are greater than 16-bit, it is
possible to just ignore elwidth overrides here
> that still leaves mode[3] for some sort of decision.
or another mode in future. best to have mode[3:4]=0b01
and reserve other combinations.
> the neat thing about this is that even sv.addi can load
> an array of immediates. oris as well.
the *entire pattern* of 5 instructions to load 64-bit immediates
can be Vectorized
addi rt,0,#nnnn
addis rt,0,#nnnn
rldicl rt, 32
ori rt,0,#nnnn
oris rt,0,#nnnn
becomes:
sv.addi/vi rt,0,#nnnn
...
for sv.fli/vi (see https://bugs.libre-soc.org/show_bug.cgi?id=1092#c19)
it is a simple matter of inlining multiple instructions.
i would strongly suggest though *not* trying to piss about
with binutils syntax, just have ".long 0xnnnnnnnn" after it.
> as we discussed yesterday it requires an "Unconditional
> Branch" effect, and i'd recommend it be on MAXVL not VL.
> also to round-up to the nearest 4-bytes.
MAXVL allows for dynamic code to *change the number of immediates loaded*
which is extremely important given that this is compile-time static.
> if RM."immediate-mode":
>
> NIA = CIA + CEIL(MAXVL * sizeof(immediate), 4)
forgot that of course the 1st immediate is already in the instruction.
and set hardcoded to 16
if RM.normal."vector-immediate-mode":
NIA = CIA + CEIL((MAXVL-1) * 16, 4)
i think not having to read elwidth here will be *really* important,
otherwise the Decoder has a hell of a job.
it is going to be tough enough to identify that this is
"Unconditional Branch": not only does the suffix need identifying
(to find out if it is RM.normal) but the "vector-immediate-mode"
itself needs decoding...
... oh and *then* the new PC can be calculated.
to that end this is DEFINITELY something that goes into the
"Upper" Compliancy Levels.
> jacob you mentioned during the meeting that this would
> be "slow" i.e. dependent on Architectural State (SVSTATE),
> if someone modified SVSTATE with mtspr then things get
> slow: this is *already* in the spec.
it's that some implementations will have caches of where SVSTATE was,
but others will not.
--
You are receiving this mail because:
You are on the CC list for the bug.
More information about the libre-soc-bugs
mailing list