[Libre-soc-bugs] [Bug 558] gcc SV intrinsics concept

Wed Dec 30 03:19:54 GMT 2020

https://bugs.libre-soc.org/show_bug.cgi?id=558

--- Comment #22 from Luke Kenneth Casson Leighton <lkcl at lkcl.net> ---
(In reply to Alexandre Oliva from comment #21)

> this would actually avoid one problem that we're going to face, namely, that
> operations on vectors, as far as GCC is concerned, are parallel, rather than
> sequential.

SIMD-batches in other words.  this is a potential route (when looking at the
autovectorisation stage, say appx 1+ years from now)

> register allocation, and the solution involves preventing the compiler from
> reusing input registers as so-marked outputs.

sensible.

> But when you've got vectors and you wish to exploit the fact that the
> iterations are sequential, preventing overlapping allocations won't cut it. 
> Though you probably wouldn't be doing such tricks through the compiler to
> begin with,

no, this is definitely advanced optimisation, that would need a lot of time.

> so maybe it is fine, after all.
> 
> 
> Regardless, I'm coming to think that we may be able to save ourselves a lot
> of headaches in the tooling side, out of this significant difference from
> other "vector" systems, by having our vectors modelled as parallel rather
> than sequential. 

yes.  this was the basis of the "allocate a batch of registers" idea, which it
looks like gcc already has underlying support for this because of SIMD.

from the auto-vector loop case i found the assumption there is that the SIMD
width is fixed.  which is fine up to a point.  multi-issue in our case
increases execution throughput.

in this case, in stage 2, the MAXVL would be set hardcoded to a value that,
combined with elwidth, would always come out to an amount that took up exactly
4x 64 bit registers for example.

but to get to _that_ point we need stage 1 first 

> In most cases, it won't matter, because there won't be
> overlaps, but in those that do, the hardware could easily detect it and
> switch from counting upwards to downwards when needed to get the semantics
> of parallel access right.

this is quite a big change.  i will go over it in a separate (new) bugreport.
it hae merit if we include an exicit mapreduce mode

> Fail-first mode and twin-predication are ones that immediately raise flags
> of potential incompatibilities with this change, but it might still be worth
> at least considering it.

this is why i wanted a first phase "just above bare metal" because once that is
done and at least a SV_CONTEXT can be pushed on the stack there will be a
better working knowledge and an incremental base.

-- 
You are receiving this mail because:
You are on the CC list for the bug.