[Libre-soc-dev] [SVP64] feedback needed - Pack/Unpack (vpack/vunpack)

lkcl luke.leighton at gmail.com
Sat Aug 13 02:27:10 BST 2022

On Fri, Aug 12, 2022 at 6:17 PM Richard Wilbur <richard.wilbur at gmail.com> wrote:
> I wholeheartedly support efforts to make the movement of data in and out of registers as efficient and flexible as possible:
> 1.  Easy, quick context switches are important to remove barriers to use of any capability.
> 2.  Efficiently moving data between storage and processing postures is very important to the maximum processing throughput.

well, funny you should mention that, because one of the things i
kinda insisted on was extending GPRs, FPRs, and CRs, not
adding VPRs and MaskRegs.

does that make things a leeetle more... how-do-we-say-in-engleeesh...
"interesting" when it comes to Register Hazard Management?
well, yes it does, and i have planned a micro-architecture to
make that manageable, or at least "not full of massive crossbars"
and so on.

> > On Aug 11, 2022, at 13:01, Luke Kenneth Casson Leighton via Libre-soc-dev <libre-soc-dev at lists.libre-soc.org> wrote:
> […]
> >
> > This would just leave fmv which I am not sure there is anything
> > that could be done about it, there are no magic aliases,
> > no FP add-immediate instructions which begs the obvious
> > "why the heck aren't there any". oh well
> This is one of the issues I ran into in x86 vs. x86/64, it was frighteningly inefficient/impossible to move data from one set of registers to another in order to use different ALU’s at their native width together to solve a particular problem.  My recollection is that we had to dump to memory in between!

yes.  Tom Forsyth explained what happens, and why this is, in
his talk on AVX512.  a SIMD regfile is added, with a massive set of ALUs that are so large the distance between them makes it
impossible to get data from another regfile over to those units.

In Larrabee Tom *really* wanted to use the Pentium III scalar
registers as predicate masks, but the distance from that
scalar core over to the (new) 512-bit ALUs was so great that
the team actually had to design a special mask regfile plus
associated instructions, making them part of the *512 bit*
ALUs and so getting the regs close to where they were being

Each new extension to AVX gets the exact same problem, nobody
wants to completely redesign the "old" core, they add a *new unit*
onto the old one.

thus in this scenario yes the only way to get data from one regfile
to another is via memory!

in NEON and SVE the hilarious thing is even though the reg nums
are the same actually they are not, internally: you still have to
store the reg used by SVE in memory then pull it back out of
memory in order to use a NEON instruction on it!

we have an advantage in that we are indeed beginning from
scratch but have to be careful and mindful that IBM are not.
thus Simple-V has to be a clean retrofit onto what IBM has in
POWER10/11 with their OoO multi-issue microarchitecture.

hence, primarily, *not* adding new vector regfiles

> > We do not want the scenario where it is easy to move data
> > around with Pack/Unpack but not the CR Fields that got created
> > by Rc=1
> >
> > Hmmm....
> >
> > This is unavoidably measure and non-uniform than it seems
> > on first glance.
> I don’t understand this last sentence.  What are you trying to say?

i got autocorrected by a bloody kindle tablet and have now
completely forgotten!

i think it was that the uniformity promised by SimpleV cannot
be provided here.  i would love to be able to use EXTRA3 encoding
*and* still have 2 bits spare for Pack/Unpack but there are only
9 bits in RM.EXTRA.

src extra3 3 bits
dest extra3 3 bits
pack/unpack 2 bits
dest elwidth 2 bits

that's 10.

have to be similarly careful with CR ops, to make sure there is
corresponding Pack/Unpack but NOT damaging the range of
registers that can be covered.


More information about the Libre-soc-dev mailing list