[Libre-soc-bugs] [Bug 558] gcc SV intrinsics concept
bugzilla-daemon at libre-soc.org
bugzilla-daemon at libre-soc.org
Wed Dec 30 03:19:54 GMT 2020
https://bugs.libre-soc.org/show_bug.cgi?id=558
--- Comment #22 from Luke Kenneth Casson Leighton <lkcl at lkcl.net> ---
(In reply to Alexandre Oliva from comment #21)
> this would actually avoid one problem that we're going to face, namely, that
> operations on vectors, as far as GCC is concerned, are parallel, rather than
> sequential.
SIMD-batches in other words. this is a potential route (when looking at the
autovectorisation stage, say appx 1+ years from now)
> register allocation, and the solution involves preventing the compiler from
> reusing input registers as so-marked outputs.
sensible.
> But when you've got vectors and you wish to exploit the fact that the
> iterations are sequential, preventing overlapping allocations won't cut it.
> Though you probably wouldn't be doing such tricks through the compiler to
> begin with,
no, this is definitely advanced optimisation, that would need a lot of time.
> so maybe it is fine, after all.
>
>
> Regardless, I'm coming to think that we may be able to save ourselves a lot
> of headaches in the tooling side, out of this significant difference from
> other "vector" systems, by having our vectors modelled as parallel rather
> than sequential.
yes. this was the basis of the "allocate a batch of registers" idea, which it
looks like gcc already has underlying support for this because of SIMD.
from the auto-vector loop case i found the assumption there is that the SIMD
width is fixed. which is fine up to a point. multi-issue in our case
increases execution throughput.
in this case, in stage 2, the MAXVL would be set hardcoded to a value that,
combined with elwidth, would always come out to an amount that took up exactly
4x 64 bit registers for example.
but to get to _that_ point we need stage 1 first
> In most cases, it won't matter, because there won't be
> overlaps, but in those that do, the hardware could easily detect it and
> switch from counting upwards to downwards when needed to get the semantics
> of parallel access right.
this is quite a big change. i will go over it in a separate (new) bugreport.
it hae merit if we include an exicit mapreduce mode
> Fail-first mode and twin-predication are ones that immediately raise flags
> of potential incompatibilities with this change, but it might still be worth
> at least considering it.
this is why i wanted a first phase "just above bare metal" because once that is
done and at least a SV_CONTEXT can be pushed on the stack there will be a
better working knowledge and an incremental base.
--
You are receiving this mail because:
You are on the CC list for the bug.
More information about the libre-soc-bugs
mailing list