[Libre-soc-dev] [RFC] SVP64 Vertical-First Mode loops
luke.leighton at gmail.com
Thu Aug 19 15:44:38 BST 2021
one thing that does get expensive in Vertical-First Mode: predicated execution.
for CR-based predication, not so much of a problem: the CRs are 4 bit wide, the CR regfile port width will be 8x4=32, so not so many wires.
INT predication, the entire 64 bit INT GPR is read multiple times, only 1 bit extracted (1<<src/dststep) and applied just for that one element and one element only.
this is the driving force behind why i wanted VFHint to do "batches".
to allow INT predicate masks to be cached i.e. loaded only once it would probably be a good idea to say that writing to INT predicate masks then subsequently using them within a VF loop is UNDEFINED behaviour.
CR predicates, hmmm i can see the value of being able to write to individual CR Fields that are then immediately used as predicate masks.
More information about the Libre-soc-dev