[Libre-soc-dev] clamping/saturation semantics

Sun Dec 13 02:59:38 GMT 2020

On Sat, Dec 12, 2020, 11:32 Luke Kenneth Casson Leighton <lkcl at lkcl.net>
wrote:

> On 12/12/20, Lauri Kasanen <cand at gmx.com> wrote:
>
> >
> > Added, and with Hendrik's point I used a div for the narrowing wrong
> > example.
>
> star.
>
> in my initial thoughts 18 months ago people were advising
> (particularly for FP) that ops should be done at the larger width,
> followed eventually by narrowing (where dstwid < srcwid)
>

I think the best way to do it is to do the ops at as wide as necessary to
avoid overflowing intermediates, then do the saturation to dest size at the
end.

In particular, for i8 * i8 -> i8 multiplication, the intermediate needs to
be i16, since otherwise you'd get the wrong answer:
with 8-bit intermediates: 0x40 * 0x40 -> 0x00; saturates to -> 0x00 (wrong)
with 16-bit intermediates: 0x40 * 0x40 -> 0x1000; saturates to -> 0x7F
(correct)

Similarly for u8 - u8 -> u8 subtraction, the intermediate needs to be i9 to
avoid getting the wrong answer:
with 8-bit unsigned intermediates: 0x01 - 0xFF -> 0x02; saturates to ->
0x02 (wrong)
with 9-bit signed intermediates: 0x01 - 0xFF -> -0xFE; saturates to -> 0x00
(correct)

Same kind of thing for addition and left-shift.

Jacob