[Libre-soc-dev] new svp64 page
Luke Kenneth Casson Leighton
lkcl at lkcl.net
Thu Dec 10 20:53:49 GMT 2020
On 12/10/20, Jacob Lifshay <programmerjake at gmail.com> wrote:
> On Thu, Dec 10, 2020, 10:07 Luke Kenneth Casson Leighton <lkcl at lkcl.net>
>> On 12/10/20, Luke Kenneth Casson Leighton <lkcl at lkcl.net> wrote:
>> > On 12/10/20, Lauri Kasanen <cand at gmx.com> wrote:
>> >> On Thu, 10 Dec 2020 16:27:33 +0000
>> >> Luke Kenneth Casson Leighton <lkcl at lkcl.net> wrote:
>> >>> lauri, jacob, what's your thoughts on using 2 bits for clamping mode?
>> >>> this is *not* the same as elwidth itself, which is the "chop" in VSX
>> >>> ops pseudocode.
>> >>> or: another idea:
>> >>> * extsb, extsh, extsw specify one type of width
>> >>> * twin predication specifies 2 more (src elwidth, dest elwidth)
>> >>> * 1 bit says "operation is to be clamped" (not to which range, that's
>> >>> implicit)
>> >> I can't come up with a use case for having different clamping to dst
>> >> elwidth. If you want 8-bit unsigned saturation, there's no reason for
>> >> you to write that to 16-bit elements. So I would take the clamp width
>> >> from the dst elwidth.
>> > it's not that the elwidth has a reason (or not), it's that add and
>> > other arith ops *don't* have sign/uns (except for mul and div) and
>> > they don't have a full range of 8/16/32.
>> > now, if we allow dest elwidth even on 2-src *arithmetic* operations
>> > (something that was left out of SVP originally because of lack of
>> > space), then now the one bit "sat" (or 2 bit, one for signed one for
>> > unsigned) starts to gel.
>> >> I would simply have two bits to enable clamping, unsigned and signed.
>> >> 16 and 32 bit do need both, not just 8-bit.
>> > i realised belatedly that add does not have add-signed as separate
>> > from add-unsigned. nor is there, in Power, an add8 or add16.
>> > i will see if there's space in the 24 bits for dest elwidth and 2 bits
>> > for sat mode.
>> there is.
>> does this look like a reasonable general-purpose algorithm, applicable
>> to all operations, whether exts*, mr, or 2/3 arithmetic ops?
> I don't think we need a second elwidth except for size conversion ops,
> saturating ops don't need 8-bit output 16-bit input add (or other
> 3-argument ALU ops with different output size).
dest elwidth overrides has secondary purposes such as providing
one of the "disadvantages" of the RISC approach where pre/post
processing is controlled uniformly by N bits is: some combinations
just do not make sense.
for example: saturation bits on logical ops are totally meaningless.
likewise, elwidth overrides except to truncate or zero-extend.
we... well... could go down the CISC route, or the VSX route.... and i
would be very unhappy.
> For implementing average,
> we could encode that by repurposing xor (or some other bitwise op) with
> saturation to instead mean averaging add.
ah now in light of the above that makes sense. reuse opcode space
rather than allocate new ones.
except... i was thinking in this particular case, actually adding avg
*to* OpenPOWER v3.N actually has a reasonable justification: reduction
of the SIMD post-cleanup code size.
it puzzles me that there's all this wonderful powerful SIMD ops yet
the scalar ops, absolutely crucial to do cleanup of non-aligned
multiples of the SIMD size, are left without corresponding ops!
> We will also want saturating mul, saturating sub, and maybe saturating
do take a quick look at the pseudocode, see if you think it covers all
the options there.
More information about the Libre-soc-dev