[Libre-soc-dev] [RFC] svp64 "source zeroing" makes no sense
Richard Wilbur
richard.wilbur at gmail.com
Sun Mar 21 23:16:03 GMT 2021
On Sun, Mar 21, 2021 at 4:40 PM Luke Kenneth Casson Leighton
<lkcl at lkcl.net> wrote:
>
> On Sunday, March 21, 2021, Richard Wilbur <richard.wilbur at gmail.com> wrote:
> > What is the maximum value of VL
>
> y4.
>
> > (or similarly, the maximum length of
> > the predication masks)?
>
> integer. 64.
>
> > I have in mind an optimization that avoids
> > iterating through the 0 bits of the predication masks with logic that
> > will generate the {src|dst}step pointing to the next non-zero bit in
> > the mask provided there is at least one bit set.
The optimization is very simple. How sparse do you expect these
predication masks to be?
>
> this will be possible as a choice for individual implementors where it
> makes sense based on gate count, performance and power consumption for
> their needs.
>
> it will be helpful to record such optimisations for when there is time to
> implement them.
I'm happy to do that. It's just so simple that I was thinking it
sounded like an easy win if we expect predication masks to be fairly
sparse as you save the cycles every time you perform an instruction.
> right now the priority question is "does ORing the src and dest zeroing to
> put zeros in the output make sense"
With the code, I would suggest:
if dest_zeroing and src_zeroing:
dstmask &= srcmask
>
> if dest_zeroing and ((1<<dststep) & dstmask) == 0):
> > result = 0
> > Condition_Register = EQzero
> > else:
> > if src_zeroing and ((1<<srcstep) & srcmask) == 0:
> > RA = 0
> > RB = 0
> > else:
> > RA = get_register_RA
> > RB = get_register_RB
> > result, Condition_Register = calc_operation(RA, RB)
>
>
> this is still the old behaviour which is passing zeros into the pipelines
> as input.
>
> this behaviour makes no sense and must be replaced.
I guess that's because I don't understand the intent. To me, source
zeroing just passes 0's into whatever you were going to do.
> 4. If there are un-enumerated side effects that we wish to reproduce
> > from calc_operation()
>
>
> pipelined designs should not have such side effects because they require
> complex hazard detection to coordinate and it severely impacts
> opportunities for parallelism (performance)
>
> thus, logically, if there is a choice "compromising performance to maintain
> some arbitrary side-effect" the side-effect gets quashed with prejudice.
Good!
More information about the Libre-soc-dev
mailing list