[Libre-soc-dev] new svp64 page

Luke Kenneth Casson Leighton lkcl at lkcl.net
Fri Dec 11 18:58:51 GMT 2020


On Fri, Dec 11, 2020 at 7:23 AM Lauri Kasanen <cand at gmx.com> wrote:
>
> On Thu, 10 Dec 2020 18:07:23 +0000
> Luke Kenneth Casson Leighton <lkcl at lkcl.net> wrote:
>
> > does this look like a reasonable general-purpose algorithm, applicable
> > to all operations, whether exts*, mr, or 2/3 arithmetic ops?
> >
> > * saturation is done on the result at the **source** elwidth
>
> This would be a problem. For many cases, dst width != src width.
>
> Say you have gathered stuff to u16 and then want to scale that into
> u8, clamped. That's a u16 * u16 = u8 op - different src and dst
> elwidths.

ok, so this example is why i asked.  2 bits, signed-unsigned, is not
enough.  hence the addition of two *more* bits specifying the
saturation quantity: 2^8, 2^16, 2^32.  actually then the table may be:

* none / reserved
* byte s/u
* half s/u
* word s/u

which only needs 3 bits, one reserved encoding.


the issue is: that's starting to becone an awful lot of bits,
relatively speaking.  yes we happen to have 2 spare, yes these can be
passed as state/context just like immediates down to the FUs, yes we
can make those 3 bits mean something different for FP and logical FUs.

however we may need those bits for something else.  it is all a balance.

Jacob pointed out when we had similar pressure on swizzle that one
possibility was to create a mv.swizzle operation, only taking 1 src,
and performing macro-op fusion.  it's expensive but doable.

a similar case applies here.  in other words we have three options:

  * create a suite of operations that take
     clamp ranges as part of the op.

or:

   * perform 16 bit arith
   * copy src u16 clamped into u8 dest
   * copy u8 src into u16 dest

or:

   * perform 16 bit arith @ 8bit clamp

the last is clearly favourable, the former least.

Lauri can i ask: how common is clamped arithmetic in AV? i think i
know the answer (very) however in any given algorithm, what percentage
of operations are clamped?

if it is "30%" per audio sample then clearly that weighs strongly in
favour of the extra 2 bits.  if however it is only say 2% then
honestly we have higher priorities to weigh.

l.



More information about the Libre-soc-dev mailing list