[Libre-soc-bugs] [Bug 558] gcc SV intrinsics concept

bugzilla-daemon at libre-soc.org bugzilla-daemon at libre-soc.org
Mon Jan 11 14:42:02 GMT 2021


https://bugs.libre-soc.org/show_bug.cgi?id=558

--- Comment #38 from Luke Kenneth Casson Leighton <lkcl at lkcl.net> ---
(In reply to Jacob Lifshay from comment #37)
> (In reply to Luke Kenneth Casson Leighton from comment #36)
> > my feeling is that rather than add several tens of thousands of intrinsics
> > autogenerated (the software equivalent of propagating "SIMD considered
> > harmful"), __attribute__ could well provide the means to bring in SVP64
> > Context, with very little needed to be added with the exception of setvl.
> 
> I think we should have intrinsics for saturated/not add, sub, mul, etc.,
> since gcc intrinsics can be generic over the type such that one intrinsic
> works on all the different combinations of scalar/vector types. That way,
> we'd only need something like 20-30 intrinsics to cover all the weird and
> wonderful alu/load/store ops, not hundreds or thousands.

conceptually there is very little difference to this:

// use the existing gcc vector_size attribute
typedef int16_t MyVec[45] __attribute__((vector_size=45));

typedef int16_t MyVecSat[45] __attribute__((vector_size=45,
sv_satsigned=8bit));

 void mix_audio(int16_t a[], int16_t b[], int16_t dc_balance, size_t len) {
     while(len > 0) {
         size_t vl = __sv_setvl<MyVec>(
len);
         MyVec attribute(sv_vl=vl) v_a = a;
         MyVec attribute(sv_vl=vl) v_b = b;

         MyVecSat temp = v_a - dc_balance;
         temp += v_b;
         ...
         ...

can you see that those are functionally directly equivalent, one requires
intrinsics, the other does not? (except setvl)

the creation of *any* explicit intrinsics effectively disregards the entire SV
concept which is that it is context-based, losing a golden opportunity in the
process.

the clue was in "how easy it would be to autogenerate" being literally a set of
nested for-loops with very few exceptions of combinations that won't work.

wherever that is possible you have two choices:

1) create the mass of permutations
2) leave it up to a "context" which is intelligently applied and understood
(this is how SE/Linux works btw)

i have made the mistake of (1) twice now.  the first time was in Samba TNG, the
second time was with 30,000 autogenerated functions in python-webkit.

the result was a massive binary executable size increase, where microsoft, by
having a dynamically runtime interpreted type library, managed to create tiny
binaries that only needed one library to handle absolutely every single MSRPC
interaction.

if we were doing a "standard" Vector ISA with large holes in the potential
permutations (because the combinations spiral out of control just as they do
for SIMD) i would agree with you immediately, jacob: an autogenerated
by-the-numbers set of intrinsics would be the only sane way to do it.

the fact that RVV has added specialist this, specialist that, only one set of
reduce operations for example (where we hace abstracted it), there is no other
option for them.

we on the other hand have abstracted out even saturation as a general concept. 
mapreduce likewise.  REMAP likewise.  REMAP can be applied even to vectors of
fcmp or vectors of bpermd.

and here is the kicker: all these abstractions apply to *future* instructions
that we can't yet envisage.  add even one new instruction and that entire
autogenerated intrinsics table has to be redone.

by contrast the "patch" surface for adding any new scalar instruction if we use
attributes is extremely small.  it might even be possible to do as macros
wrapped around inline asm blocks, zero soure modifications to gcc *at all*


additionally: having a frontend generate strings (or AST) that does
"sv_satu_ew8_subvl3_add()", that then has to be de-flattened and decoded when
creating svp64 prefixes.

that alone is a hell of a lot of CPU power *and* a lot of work.

or

attribute(svsatu, subvl3, ew8)

this not only fits into existing gcc data structures it requires far, far less
code to turn into an svp64 RM24 context.  i mean, it *is* the svp64 context.

nothing about the explicit creation of intrinsic datatypes and funtions looks
good to me.  by complete contrast using attributes carries type context that
meshes pretty much directly with svp64.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


More information about the libre-soc-bugs mailing list