[Libre-soc-dev] [RFC] SVP64 Vertical-First Mode loops
lkcl
luke.leighton at gmail.com
Fri Aug 20 10:11:34 BST 2021
On Fri, Aug 20, 2021 at 3:20 AM Jacob Lifshay <programmerjake at gmail.com> wrote:
> > >if you only are running one element per inner loop, it makes me think
> > >this
> > >won't be any faster than scalar code ... not a good look.
> >
> > that's where out-of-order multi-issue comes into play.
>
> But you still have the issue of needing to re-fetch and re-decode the
> loop, and predict the branch (usually you can only predict one branch
> per cycle, unless you have an absolutely monster core),
the demarcation in Mitch's VVM is done with a pair of instructions
(start-loop, end-loop) where start-loop indicates which register is to
be the "loop counter".
consequently, like all Zero-Overhead Loop ISAs there is zero branch
prediction miss.
this is why i am adding CTR mode to SVP64 Branches.
> making it not
> much faster or power-efficient than a standard OoO processor executing
> a scalar loop. SV is faster precisely because the fetch/decode pipe
> stops and sends a firehose of usually pre-simd-packed ops at the
> execution units. Seems like we're throwing away our advantage...
no, not at all.
there's nothing to stop you from doing the Cray-style Horizontal-First.
if however you try that with DCT it requires the extra registers.
Vertical-First Mode is actually a hell of a lot easier to understand
and teach compilers about.
l.
More information about the Libre-soc-dev
mailing list