[Libre-soc-bugs] [Bug 558] gcc SV intrinsics concept

Tue Jan 12 01:39:30 GMT 2021

https://bugs.libre-soc.org/show_bug.cgi?id=558

--- Comment #47 from Luke Kenneth Casson Leighton <lkcl at lkcl.net> ---
hmm. i have a thought.  bear with me.

* the goal is to get working vectorised assembly with the absolute bare minimum
modifications to gcc
* a suite of scalar code, making use silently of CRs, should therefore also
work once vectorised.... *without modifying gcc*
* therefore when any one line of scalar code is marked as "vectorised" the CR
operations behind it must *also mirror the same behaviour without modification*

thus any code that creates a CR or moves a CR must have the EXACT same svp64
prefix and behave exactly the same as the scalar CR version.

this then *defines* how we must number and lay out the Vectorised CRs.

namely: the numbering - sigh - needs to be in columns, not rows.

  CR0 CR1 CR2  CR3  CR4  CR5 CR6 CR7
  CR8 CR9 CR10 CR11 CR12

when Vectorised the increments 0..VL-1 go CR0 CR8 CR16 CR24 **NOT** repeat
**NOT** CR1 CR2 CR3 CR4

this was the "Matrix" idea that i outlined woukd be absolute hell to implement
the DMs for.

sigh.

however it would ensure that for scalar code that is created with scalar CRs,
CR0 to CR7 being ANDed and ORed and etc etc, when the integer expression that
generates CR0 gets Vectorised then as long as all CR operations associated with
CR0 are also Vectorised they simply propagate the attribute s/v it is *not*
necessary to do a massive redesign of gcc.

i hope i am making sense here.

basically we customise the hardware to suit gcc, not the other way round. 
that's what Tim Forsyth was on about.

now, the only problem is: 64 CRs results in wrapping far too quickly.

CR8 CR16 CR24 CR32 CR40 CR48 CR56 whoops we have to go to CR0 next.

this places an artificial limit on the length of MAXVL that can be used without
serious modifications to gcc.

if however we increase to 128 CRs then MAXVL can go up to 16 without wrapping
when "nominally scalar" code, referring to CR1 and not knowing it's a Vector,
actually operates on 16 CRs CR1 CR9 CR17 ... CR(1+15*8)

being able to do Vectors up to 16 in length with zero significant code
modifications to gcc and yet still be able to write just above bare metal
assembler *and* not need USD 250k VC funding is a pretty damn good deal.

-- 
You are receiving this mail because:
You are on the CC list for the bug.