[Libre-soc-dev] [RFC] svp64 "source zeroing" makes no sense

Richard Wilbur richard.wilbur at gmail.com
Sun Mar 21 18:49:41 GMT 2021


On Sun, Mar 21, 2021 at 7:09 AM Luke Kenneth Casson Leighton
<lkcl at lkcl.net> wrote:
>
> as i'm implementing the source- and dest- predication zeroing, it's not
> making sense:
>
>        if not src_zeroing:
>             while (((1<<srcstep) & srcmask) == 0) and (srcstep != vl):
>                 print ("      skip", bin(1<<srcstep))
>                 srcstep += 1

Does this mean that if we are not doing source zeroing with the
predication, then we skip the operation on vector elements where the
source predication mask is 0 from least significant bit up till the
least significant 1 in the source predication mask?

>        if not dest_zeroing:
>             # same for dststep
>             while (((1<<dststep) & dstmask) == 0) and (dststep != vl):
>                 print ("      skip", bin(1<<dststep))
>                 dststep += 1

This seems to say that if we are not doing destination zeroing with
predication, then we skip the operation on vector elements where the
destination predication mask is 0 from least significant bit up till
the least significant 1 in the destination predication mask?

In both of the above, what seems to be most important in the
predication masks is the position of the least significant bit set to
1.  Any 0 bits above that don't seem to have any effect on skipping
the operation.

>        if src_zeroing and ((1<<srcstep) & srcmask) == 0:
>             RA = 0
>             RB = 0
>        else:
>             RA = get_register_RA
>             RB = get_register_RB
>
>        result, Condition_Register = calc_operation(RA, RB)
>
>        if dest_zeroing and ((1<<dststep) & srcmask) == 0):
>             result = 0
>             Condition_Register = EQzero
>
>        store_result(result)
>        if Rc=1: store_cr(Condition_Register)

This section seems to mean that:
1.  If there is a higher significance (than the least significant 1
bit) 0 bit in the predication mask and source zeroing is not active we
still perform the operation on that vector element regardless of the
source or destination predication masks.
2.  If source zeroing is active and the source predication mask bit is
0 then we use 0 for the source operands.
3.  If destination zeroing is active and the destination element's bit
of the source(?) predication mask bit is 0 then we overwrite the
effect of performing the operation by setting result to 0 and the
Condition_Register to EQzero.  If result and Condition_Register were
the only effects of the operation, we just wasted our time (and
energy).

This has several issues:

1.  The operation is predicated only on the bits of the source or
destination predication mask from the least significant bit up to the
least significant 1.  And then only if we aren't source zeroing or
destination zeroing.
2.  Above that (the least significant set bit of the predication mask)
the operation is always performed but the result is thrown away if
destination zeroing and the bit of the source predication mask
corresponding to the destination element is 0?  This seems like it
should refer to the destination predication mask.
3.  If we skip an operation we intended to destination zero, it
doesn't look like it gets zeroed because we skip storing the result.
4.  If we perform the operation and destination zero the result and
Condition_Register, we have just wasted our time and energy by wiping
out the results, unless there is an un-enumerated side effect.


Here's a reimplementation of the above that, I believe, addresses the
issues I saw:

        if not src_zeroing:
             while (srcstep != vl):
                 if ((1<<srcstep) & srcmask) == 0):
                     print ("      skip", bin(1<<srcstep))
                 srcstep += 1
        if not dest_zeroing:
             # same for dststep
             while (dststep != vl):
                 if ((1<<dststep) & dstmask) == 0):
                     print ("      skip", bin(1<<dststep))
                 dststep += 1


        if dest_zeroing and ((1<<dststep) & dstmask) == 0):
             result = 0
             Condition_Register = EQzero
        else:
             if src_zeroing and ((1<<srcstep) & srcmask) == 0:
                  RA = 0
                  RB = 0
             else:
                  RA = get_register_RA
                  RB = get_register_RB
             result, Condition_Register = calc_operation(RA, RB)

        store_result(result)
        if Rc=1: store_cr(Condition_Register)


I see two possible remaining issues here, and the first depends on the
precedence of the predication and zeroing flags:
1.  If we have opted for destination zeroing but not source zeroing,
should we still zero the destination for destinations marked 0 in the
destination predication mask that the operation would be skipped by
the source predication mask?  Seems like a useful result--expected
operation of the destination zeroing flag in conjunction with the
destination predication mask.  In this case we should either not skip
the operation altogether, or directly set the result to 0, the
Condition_Register to EQzero and store the result and condition
register (if needed).  This would suggest modifying the source
predication mask processing around the "skip" something like follows,
although I'm not sure whether the source and destination predication
masks directly correspond:

        if not src_zeroing:
             while (srcstep != vl):
                 if ((((1<<srcstep) & srcmask) == 0) and ((not
dst_zeroing) or ((1<<dststep) & dstmask) == 1)):
                     print ("      skip", bin(1<<srcstep))
                 srcstep += 1

2.  If there are other un-enumerated side effects of calc_operation()
besides [result, Condition_Register], then we may want to actually
perform the operation, depending on whether the side effects are
expected--even if we are zeroing the destination.

>
> i'm more inclined towards this:
>
>        if (src_zeroing and ((1<<srcstep) & srcmask) == 0) or
>           (dest_zeroing and ((1<<dststep) & srcmask) == 0)):
>             result = 0
>             Condition_Register = EQzero
>        else:
>             RA = get_register_RA
>             RB = get_register_RB
>             result, Condition_Register = calc_operation(RA, RB)
>
> *now* that makes more sense, particularly when thought through from LD/ST.
>
> thoughts?

For some reason this second version is still using the source
predication mask for determining destination zeroing.  This also never
performs the operation with source operands zero!



More information about the Libre-soc-dev mailing list