[Libre-soc-dev] new svp64 page

Thu Dec 10 20:53:49 GMT 2020

On 12/10/20, Jacob Lifshay <programmerjake at gmail.com> wrote:
> On Thu, Dec 10, 2020, 10:07 Luke Kenneth Casson Leighton <lkcl at lkcl.net>
> wrote:
>
>> On 12/10/20, Luke Kenneth Casson Leighton <lkcl at lkcl.net> wrote:
>> > On 12/10/20, Lauri Kasanen <cand at gmx.com> wrote:
>> >> On Thu, 10 Dec 2020 16:27:33 +0000
>> >> Luke Kenneth Casson Leighton <lkcl at lkcl.net> wrote:
>> >>
>> >>> lauri, jacob, what's your thoughts on using 2 bits for clamping mode?
>> >>> this is *not* the same as elwidth itself, which is the "chop" in VSX
>> >>> ops pseudocode.
>> >>>
>> >>> or: another idea:
>> >>>
>> >>> * extsb, extsh, extsw specify one type of width
>> >>> * twin predication specifies 2 more (src elwidth, dest elwidth)
>> >>> * 1 bit says "operation is to be clamped" (not to which range, that's
>> >>> implicit)
>> >>
>> >> I can't come up with a use case for having different clamping to dst
>> >> elwidth. If you want 8-bit unsigned saturation, there's no reason for
>> >> you to write that to 16-bit elements. So I would take the clamp width
>> >> from the dst elwidth.
>> >
>> > it's not that the elwidth has a reason (or not), it's that add and
>> > other arith ops *don't* have sign/uns (except for mul and div) and
>> > they don't have a full range of 8/16/32.
>> >
>> > now, if we allow dest elwidth even on 2-src *arithmetic* operations
>> > (something that was left out of SVP originally because of lack of
>> > space), then now the one bit "sat" (or 2 bit, one for signed one for
>> > unsigned) starts to gel.
>> >
>> >> I would simply have two bits to enable clamping, unsigned and signed.
>> >> 16 and 32 bit do need both, not just 8-bit.
>> >
>> > i realised belatedly that add does not have add-signed as separate
>> > from add-unsigned.  nor is there, in Power, an add8 or add16.
>> >
>> > i will see if there's space in the 24 bits for dest elwidth and 2 bits
>> > for sat mode.
>
>
>> there is.
>>
>> does this look like a reasonable general-purpose algorithm, applicable
>> to all operations, whether exts*, mr, or 2/3 arithmetic ops?
>>
>
> I don't think we need a second elwidth except for size conversion ops,
> saturating ops don't need 8-bit output 16-bit input add (or other
> 3-argument ALU ops with different output size).

dest elwidth overrides has secondary purposes such as providing
multiply widening.

one of the "disadvantages" of the RISC approach where pre/post
processing is controlled uniformly by N bits is: some combinations
just do not make sense.

for example: saturation bits on logical ops are totally meaningless.
likewise, elwidth overrides except to truncate or zero-extend.

we... well... could go down the CISC route, or the VSX route.... and i
would be very unhappy.

> For implementing average,
> we could encode that by repurposing xor (or some other bitwise op) with
> saturation to instead mean averaging add.

ah now in light of the above that makes sense.  reuse opcode space
rather than allocate new ones.

except... i was thinking in this particular case, actually adding avg
*to* OpenPOWER v3.N actually has a reasonable justification: reduction
of the SIMD post-cleanup code size.

it puzzles me that there's all this wonderful powerful SIMD ops yet
the scalar ops, absolutely crucial to do cleanup of non-aligned
multiples of the SIMD size, are left without corresponding ops!

> We will also want saturating mul, saturating sub, and maybe saturating
> lshift.

do take a quick look at the pseudocode, see if you think it covers all
the options there.

l.