[Libre-soc-bugs] [Bug 558] gcc SV intrinsics concept

bugzilla-daemon at libre-soc.org bugzilla-daemon at libre-soc.org
Tue Dec 29 00:43:58 GMT 2020


https://bugs.libre-soc.org/show_bug.cgi?id=558

--- Comment #19 from Luke Kenneth Casson Leighton <lkcl at lkcl.net> ---
(In reply to Jacob Lifshay from comment #18)

> the problem is *not* at the assembly level, the problem is in the GCC IR
> level, where everything is represented in a totally different way using
> things like Static-Single-Assignment form, and fundamentally different than
> just a munged assembly string. 

you get the general idea.  some sort of mark, that says "don't touch this"


> It's at the GCC IR level where all the
> optimizations are performed, and they need to have all operations
> represented, either in some opaque form (like I was proposing by using
> intrinsics *in the IR*) or in a form where all the optimizers are aware of
> the semantics and know that the instructions behave differently.

... but they don't act differently, do they?

the SV context applies externally.

you're still thinking in terms of "full compiler support", which will be
rejected as a proposal.

therefore unless we find alternative sources of funding it's nice to discuss
but not going to get done.

actually i realised that the MAXVL context
should in fact propagate through the standard gcc IR layer.

if one register has been marked as MAXVL=8 and it needs to be copied to an
alternative location then the destination needs to inherit those exact
properties: MAXVL=8 as well.

the RAT will then know it needs to grab a batch of 8 regs not one in order to
perform the mv.

you are right about one thing: transferring from scalar to vector and back.  oh
wait: 

    uint32_t a;
    __attribute__{sv_vector} uint32_t *va ;
   PUSH_SV_CONTEXT(MAXVL=8)
   va[1] = a;

not a problem after all.  in the IR thus would go, "hmm va is marked as vector,
a is scalar, va is allocated to r40-47 and a is allocated to r3: i need to
issue a mv r41, r3 here"




> > 
> > 
> > > > no intrinsic vector mul.
> > > > 
> > > > no intrinsic vector add
> > > > 
> > > > no intrinsic scalar-vector mul
> > > 
> > > I wasn't proposing having scalar-vector ops since those can be represented
> > > by _sv_mul(__sv_splat(lhs, ...), rhs, ...) and almost trivially
> > > pattern-matched at instruction selection time into the scalar-vector
> > > instructions.
> > 
> > this is a full-on compiler proposal.
> 
> Sorta yes, though having the compiler detect that particular pattern is
> quite easy.
> 
> remember, the compiler goes through that pattern matching layer (instruction
> selection) for all instructions. Every instruction (ignoring inline
> assembly) is generated as the result of matching some pattern -- so all we
> do is add patterns for vector-scalar ops instead of only having
> vector-vector patterns. It's really that trivial.

the trick i am proposing even that is not done or necessary or "taught to the
compiler".

as long as the Register Allocation carries and respects the "batches"
(propagates the MAXVL context) correctly my feeling is that the IR will cope
perfectly fine.

the resource prioritisation (costings) for optimisation passes will increase by
MAXVL registers, however i am not hugely concerned about optimisation phases
initially.

> > 
> > if we put in a full-on gcc application for funding under the Assure
> > Programme i can guarantee it will be rejected.
> > 
> > the trick i am proposing is borderline but doable, and it works *only* if
> > cryptographic primitives are the *primary* focus of the Grant Application
> 
> so, we have aes step be the first ALU op we implement :) the rest should be
> relatively easier once we have 1 working, since it's mostly copy/paste.

Rijndael is more complex because the cycle mul phase involves a 64 bit x vec2
(or more likely 32bit x vec4) as inputs in order to get up to 128 bit.

crc32 (see riscv bitmanip) would be a far easier starting point, moving to
Rijndael as a second wave of primitives within the proposal, when it comes to
adding vec2/3/4


let me think overnight about the "autogeneration of full intrinsics" idea.  if
you can show me that there exists a machine-readable list of all openpower
instructions in gcc or that it is VERY fast to create one (hours not days),
such that the entire job can be done in eight weeks flat i'll be ok considering
putting that forward to NLnet.

the majority of the proposal *has* to be crypto primitives not gcc focussed.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


More information about the libre-soc-bugs mailing list