[Libre-soc-dev] [RFC] SVP64 Vertical-First Mode loops

Fri Aug 20 10:11:34 BST 2021

On Fri, Aug 20, 2021 at 3:20 AM Jacob Lifshay <programmerjake at gmail.com> wrote:

> > >if you only are running one element per inner loop, it makes me think
> > >this
> > >won't be any faster than scalar code ... not a good look.
> >
> > that's where out-of-order multi-issue comes into play.
>
> But you still have the issue of needing to re-fetch and re-decode the
> loop, and predict the branch (usually you can only predict one branch
> per cycle, unless you have an absolutely monster core),

the demarcation in Mitch's VVM is done with a pair of instructions
(start-loop, end-loop) where start-loop indicates which register is to
be the "loop counter".

consequently, like all Zero-Overhead Loop ISAs there is zero branch
prediction miss.

this is why i am adding CTR mode to SVP64 Branches.

> making it not
> much faster or power-efficient than a standard OoO processor executing
> a scalar loop. SV is faster precisely because the fetch/decode pipe
> stops and sends a firehose of usually pre-simd-packed ops at the
> execution units. Seems like we're throwing away our advantage...

no, not at all.

there's nothing to stop you from doing the Cray-style Horizontal-First.
if however you try that with DCT it requires the extra registers.

Vertical-First Mode is actually a hell of a lot easier to understand
and teach compilers about.

l.