[Libre-soc-bugs] [Bug 558] gcc SV intrinsics concept

Thu Dec 31 01:45:16 GMT 2020

https://bugs.libre-soc.org/show_bug.cgi?id=558

--- Comment #33 from Luke Kenneth Casson Leighton <lkcl at lkcl.net> ---
(In reply to Alexandre Oliva from comment #31)
> luke, knowing what was or wasn't available in the cray implementation
> wouldn't tell me something was part of our plans.  I had happened to come
> across setvli, but I had not seen any mention (that I recall) of setvl,
> that's all.
> 
> also, the output of setvl is not really what I was interested in, though the
> responses have made me realize that, if we're to model VL in GCC at all, we
> can't just assume it will hold whatever value we store in it.

correct.  it stores the value that is defined by what the setvl instruction
does.  and that behaviour is very specifically defined, in such a way that yes,
the compiler may model it.

however...

again i do have to stress that this is seriously, seriously out of scope for
this bugreport.  yes it is part of the needs of a project that is approximately
one year (minimum) into the future from now, however as far as this particular
proposal is concerned, it is actually a major distraction, because it's only
"autovectorisation" that needs to have gcc understand what VL is.

for "stage 1" we will need a *modest* understanding of VL - an intrinsic
function (one of the very few) that can get RT stored into a size_t variable
(exactly as is done in the rvv gcc patch).

but as far as "gcc auto-recognising the concept of VL such that it may perform
loop auto-vectorisation", no that is *definitively* part of "stage 2" and is
100% out of scope.

> what I was interested in was the dynamic setting of vl after a
> fail-on-first. 

again: i stress that although i am happy to answer, it's critically important
that you realise and accept that those answers are 100% out of scope for this
bugreport and for the proposal involving crypto-primitives and "just aboe
bare-metal"

>  the compiler may have to save it and restore it even in the
> absence of function calls.  an intervening use of a different vector may
> require a setvl, and then, when we go back to the original vector, we'd
> better restore the vl that it had at the end, rather than assume it's the
> same as in the beginning.

the general rule is that any given loop uses VL for the purposes of that loop,
and nothing else.  if there is a separate loop, it is a separate VL and the two
absolutely do not mix.

VL within the context of a given loop therefore absolutely and exclusively
applies solely to the vectors within that loop.

*at no time* will there be a different (overlapping) VL applying to an
unrelated vector, by which "return" to an "original" vector is proposed or
considered.

so - no: save and restore of VL within a loop - with or without fail-first - is
just something that "is not done".

now, that said: if you have *nested* vector loops, then yes, i would say that
saving and restoring of VL (and MAXVL, remember) would be reasonable.

again, however, i stress that that is *strictly* "phase 2", and at the "phase 1
just above bare metal" level, it would be the responsibility of the *developer*
to perform the necessary saving/restoring, explicitly and directly.

> now, you could say "don't do that", but I'm just trying to model things in
> the compiler, and once you model vl as a register in the compiler, and state
> the register has to be set before an instruction can operate on this object
> as a vector, there is a possibility it will vectorize stuff in a loop that
> calls a function that happens to use a different vector type, and even that
> the function is inlined. 

again: this is very much out of scope, an active area of cutting-edge research
that could literally absorb any one of us for a minimum of 12-18 months, all on
its own.

it is also specifically why many Vector Compiler ISA writers specifically
prohibit the calling of external functions.

the inlining of functions, i don't even know where to begin, there, to answer,
and that should in itself give you a "red flag" that, with my broad expertise,
if i don't know an answer - or even where to look - then it's *going* to be a
hard problem that involves weeks if not months of active research.

> as an ISA designer, you may not expect this sort
> of behavior, but as a compiler writer, I expect arbitrary code to be thrown
> at it, so I have to take these possibilities into account.

for stage 2 - some time at least a minimum of 1 year into the future: yes.

for stage 1 - "just above bare metal" - no.  the scope is limited to the
register allocation table, __attribute__(vec), adding a setvl intrinsic, and
push/pop of a MAXVL context.  oh, and adding the bitmanip/crypto-primitives as
intrinsic functions.  that's pretty much it.

this "stage 1" which is startlingly similar to gcc SIMD auto-vectorisation will
*provide* you with the understanding and experience *to* do the "stage 2"...

... but until stage 1 is completed the risk is that we won't even get to "stage
2" because we can only apply - just - for funding from NLnet for stage 1.

> 
> now, the mention of setting sub-pc is confusing when all I'm asking for is a
> means to load and restore the vl.

ok, i'm happy to answer - bear in mind that this is for stage 2 (which isn't
going to get funded for at least a year).  it's very easy: get the copy of the
SVSTATE SPR, push it on the stack.   this will save both VL and MAXVL both at
the same time.

>  I suppose the mention was because they're
> all expected to be represented in the same special-purpose register, and an
> interrupt handler may very well have to preserve it, restore it, and even
> attempt to tinker with it; so might a signal handler.  but I was talking
> about userland exception handling.  think setjmp/longjmp, not iret. 

again: this is stage 2 thinking.  we're a minimum 1 year away from that and
have no available funding for stage 2.

it's good that you're thinking about it... but until stage 1 is done we can't
get to it.

> [....]
> do you see the problem of not restoring vector state there?

yes... and it is something that should be solved for stage 2 with a minimum
budget of EUR 50,000 to 100,000, which given that i made the mistake of
cancelling the NLnet gcc Grant request, we need to find some other way.

> 
> I assume you do.  now let's take this one step further.  consider any of
> these intrinsics actually modify VL (e.g. fail-on-first); assume VL is an
> asm register variable or a macro that reads from the VL part of the
> special-purpose vector status registers somehow.

for "stage 1" this is not a problem at all because the code writer is assumed
to have a full and complete understanding of SV, and knows to expect ffirst to
be modified.

for stage 2 - which is completely out of scope for the purposes of putting in a
"stage 1" grant request, for supporting of crypto-primitives - it can be
solved.

> 
> after we continue, we add VL to the loop counter.  it may have been modified
> within the loop body, or even in an earlier loop iteration.  do you see the
> problem if the exception raised out of the signal handler doesn't restore
> the VL that was in effect, and instead leaks unrelated MAXVL and VL, set up
> within the signal handler, to the exception handler within your vector loop?

for stage 2 - which, again, is completely out of scope for the purposes of this
discussion, it would indeed be the responsibility of the compiler to understand
VL and ffirst, because it would no longer be the developer's direct
responsibility.

however, again: i emphasise: we do not have funding for stage 2, and we cannot
get it from NLnet.  the PET Programme ended on Dec 1st 2020.  we can *only*
apply for funding under the NLnet "Assure" Programme which specifically
requires a cryptographic focus. (or, there is also the NGI0 Programme which we
*might* be able to apply under "Internet Search".  i will have to investigate
and think about it).

i have spoken to Michiel and he agrees 100% that writing cryptographic routines
in pure assembler is flat-out madness and unacceptable.  they'll be unreadable,
unreviewable, and consequently lead to disastrous security.

i was then able to explain to him if that if we were able to write at least in
c-code, still requiring an understanding of SV, "just above bare metal", this
will at least result in readable code [no trying to transfer between ints and
assembly regs using "asm" statements].

(also that Lauri's task would be a lot easier)

then we get to use the crypto-primitives as an excuse to write unit tests of
the SV hardware (and simulator) and to "prove" SV... but *WITHOUT* needing to
go the whole way of a full auto-vectorisation compiler that would normally cost
USD 250,000 to 1,000,000 in funding to complete.

-- 
You are receiving this mail because:
You are on the CC list for the bug.