[Libre-soc-dev] new svp64 page
Luke Kenneth Casson Leighton
lkcl at lkcl.net
Fri Dec 11 18:58:51 GMT 2020
On Fri, Dec 11, 2020 at 7:23 AM Lauri Kasanen <cand at gmx.com> wrote:
>
> On Thu, 10 Dec 2020 18:07:23 +0000
> Luke Kenneth Casson Leighton <lkcl at lkcl.net> wrote:
>
> > does this look like a reasonable general-purpose algorithm, applicable
> > to all operations, whether exts*, mr, or 2/3 arithmetic ops?
> >
> > * saturation is done on the result at the **source** elwidth
>
> This would be a problem. For many cases, dst width != src width.
>
> Say you have gathered stuff to u16 and then want to scale that into
> u8, clamped. That's a u16 * u16 = u8 op - different src and dst
> elwidths.
ok, so this example is why i asked. 2 bits, signed-unsigned, is not
enough. hence the addition of two *more* bits specifying the
saturation quantity: 2^8, 2^16, 2^32. actually then the table may be:
* none / reserved
* byte s/u
* half s/u
* word s/u
which only needs 3 bits, one reserved encoding.
the issue is: that's starting to becone an awful lot of bits,
relatively speaking. yes we happen to have 2 spare, yes these can be
passed as state/context just like immediates down to the FUs, yes we
can make those 3 bits mean something different for FP and logical FUs.
however we may need those bits for something else. it is all a balance.
Jacob pointed out when we had similar pressure on swizzle that one
possibility was to create a mv.swizzle operation, only taking 1 src,
and performing macro-op fusion. it's expensive but doable.
a similar case applies here. in other words we have three options:
* create a suite of operations that take
clamp ranges as part of the op.
or:
* perform 16 bit arith
* copy src u16 clamped into u8 dest
* copy u8 src into u16 dest
or:
* perform 16 bit arith @ 8bit clamp
the last is clearly favourable, the former least.
Lauri can i ask: how common is clamped arithmetic in AV? i think i
know the answer (very) however in any given algorithm, what percentage
of operations are clamped?
if it is "30%" per audio sample then clearly that weighs strongly in
favour of the extra 2 bits. if however it is only say 2% then
honestly we have higher priorities to weigh.
l.
More information about the Libre-soc-dev
mailing list