[Libre-soc-isa] [Bug 1080] allowing LD/ST-Update to select individual regsters needed

Mon May 8 21:51:05 BST 2023

https://bugs.libre-soc.org/show_bug.cgi?id=1080

--- Comment #2 from Jacob Lifshay <programmerjake at gmail.com> ---
(In reply to Luke Kenneth Casson Leighton from comment #1)
> (In reply to Luke Kenneth Casson Leighton from comment #0)
> 
> > lwzux,LDST_IDX,,2P,EXTRA2,EN,d:RT,d:RA,s:RB,0,RA_OR_ZERO,RB,0,RT,0,0,RA
> > stdux,LDST_IDX,,2P,EXTRA2,EN,d:RA,s:RS;s:RA,s:RB,0,RA_OR_ZERO,RB,RS,0,0,0,RA
> > 
> > these become harder as the encoding space is only 6 bits (and there
> > are 3 regs, RT/RS RA RB) due to Twin-Predication taking up 3 bits
> > of EXTRA
> 
> this cannot be lost as it destroys VSPLAT VINDEX VGATHER VSCATTER

please define VINDEX -- it is non-standard terminology -- do you mean
load/store with index remap? that's basically gather/scatter but done using a
different mechanism.

actually, assuming the above definition of VINDEX, none of splat/gather/scatter
(also includes VINDEX since that's basically gather/scatter) need more than one
predicate. They work just fine on other ISAs with at most one predicate (e.g.
RVV and AVX2/AVX512 all have separate splat/scatter/gather/compress/expand
instructions that only have 1 predicate). The only load/store ops that need
more than one predicate are compress/expand load/store (since they are only
expressible by twin-predication in SVP64 since there are no dedicated
compress/expand instructions or SVP64 MODEs), which can easily be done using
ld/std (and maybe the *u or *x versions, but not both) instead of ldux/stdux. 

iirc the plan was originally to have twin-predication only on 1-in/1-out
operations, which ldux/stdux clearly are not.

> 
> > MASK_SRC	16:18	Execution Mask for Source
> 
> so has to stay. that leaves just 6 bits to cover 3 registers.
> 
> here's the bits of RM:
> 
> Field Name	Field bits	Description
> MASKMODE	0	Execution (predication) Mask Kind
> MASK	1:3	Execution Mask
> SUBVL	8:9	Sub-vector length
> ELWIDTH	4:5	Element Width
> ELWIDTH_SRC	6:7	Element Width for Source
> EXTRA	10:18	Register Extra encoding
> MODE	19:23	changes Vector behaviour
> 
> can't lose mask. can't lose SUBVL (priority for Pack/Unpack, already
> discussed bug #1077). *could* consider ELWIDTH_SRC, what effect does
> that have?
> 
> * Vector of RB offsets could no longer be compressed
> * SEA becomes pointless
> 
> could ELWIDTH instead be considered, and the operation width
> (ld lw lh lb) be used in its place?
> 
> * yes as long as losing saturation and sign-extending is ok.

simple -- just set ELWIDTH larger than the load op and the load op
intrinsically will do the sign/zero extend, no need for SVP64 to add sign/zero
extension on top of that. (with the sole exception of signed bytes, thanks
PowerISA for being non-orthogonal)

saturation can still be done -- saturating from the load's type to the dest
type (ELWIDTH + saturation's unsigned/signed bit).

so this removes any need for ELWIDTH_SRC on any load/store ops afaict.

-- 
You are receiving this mail because:
You are on the CC list for the bug.