[Libre-soc-isa] [Bug 213] SimpleV Standard writeup needed

Thu Nov 19 19:04:04 GMT 2020

https://bugs.libre-soc.org/show_bug.cgi?id=213

--- Comment #103 from Luke Kenneth Casson Leighton <lkcl at lkcl.net> ---
(In reply to Jacob Lifshay from comment #101)
> One additional note for swizzling: it's very common to want to put constant
> 0 or 1 in elements, so, if there's space, I think we should try to encode
> that in the swizzles.

rright.  ok i remember the discussion, it was around VBLOCK where there was
more space.  for SVP the prefix pressure was too great so if i recall we were
thinking of separate swizzle and swizzlei instructions and using macro-fusion.

> I would expect there to be circuitry in the instruction decoder to calculate
> which input elements are actually used by the swizzle and skip reading
> registers for the unused input elements, the circuit shouldn't be more than
> a few dozen gates.

that's basically exactly what scalar immediates do (see power_decode2.py,
DecodeInA when RA==0) and what you are describing is no different.

put a VL or SUBVL external loop around it, and um it's trivial.

> That way, we don't have to use up additional bits on
> something we could trivially calculate.

the space used however in the opcode is not so trivial impact-wise howeverrr...

> Taking both those into account: 6 options for 4 elements gives 6^4 = 1296
> combinations -- 11 bits. I'm sure we could find a relatively simple encoding
> for that.

well, i would prefer less complexity in the decoder.  i don't know if you're
aware but right now, PowerDecoder2 with just integer scalar PowerISA 3.0B is a
staggering 5,000 gates.

some reverse-engineering analysis of POWER9 determined it has a *2 stage*
instruction decoder!

we do have a way out though: SV-C64 and possibly even SV-C48 (compressed 16 bit
ISA with a 32 bit or 48 bit prefix).

what we could do is use 2 more major opcodes that work something like this:

* 5 bits v3.0B Major opcode(s)
* 11 bits SV Vector Context (incl SUBVL)
* 16 bits Compressed Instruction
* 16 or 32 bits "swizzle" and other data
  *including* immediate-or-swap

unfortunately we would either have to:

* set one of the CBank bits to indicate
  "swizzle mode"
* use two more precious v3.0B Major Opcodes
* start reserving "swizzle" Compressed
  opcodes.

using yet more v3.0B Major Opcodes we may actually have to start dropping
instructions to do so.  candidates include mulli, twi, tdi and lq, with moving
"sc" elsewhere and dropping madd as close second level priorities

adding special "swizzle instructions" is the Way Of SIMD Madness

using CBank settings seems the most sane approach although it too is costly in
terms of state and space.

none of these options is good!

-- 
You are receiving this mail because:
You are on the CC list for the bug.