[Libre-soc-dev] [RFC] SVP64 Vertical-First Mode loops

Fri Aug 20 03:19:57 BST 2021

On Thu, Aug 19, 2021 at 6:15 AM lkcl <luke.leighton at gmail.com> wrote:
>
>
>
> On August 19, 2021 1:04:02 AM UTC, Jacob Lifshay <programmerjake at gmail.com> wrote:
>
> >i completely spaced that you were talking about vertical-first
> >mode...oops!
>
> :v
>
> >if you only are running one element per inner loop, it makes me think
> >this
> >won't be any faster than scalar code ... not a good look.
>
> that's where out-of-order multi-issue comes into play.

But you still have the issue of needing to re-fetch and re-decode the
loop, and predict the branch (usually you can only predict one branch
per cycle, unless you have an absolutely monster core), making it not
much faster or power-efficient than a standard OoO processor executing
a scalar loop. SV is faster precisely because the fetch/decode pipe
stops and sends a firehose of usually pre-simd-packed ops at the
execution units. Seems like we're throwing away our advantage...

Jacob