[Libre-soc-dev] [RFC] SVP64 Vertical-First Mode loops

lkcl luke.leighton at gmail.com
Thu Aug 19 15:44:38 BST 2021


one thing that does get expensive in Vertical-First Mode: predicated execution.

for CR-based predication, not so much of a problem: the CRs are 4 bit wide, the CR regfile port width will be 8x4=32, so not so many wires.

INT predication, the entire 64 bit INT GPR is read multiple times, only 1 bit extracted (1<<src/dststep) and applied just for that one element and one element only.

this is the driving force behind why i wanted VFHint to do "batches".

to allow INT predicate masks to be cached i.e. loaded only once it would probably be a good idea to say that writing to INT predicate masks then subsequently using them within a VF loop is UNDEFINED behaviour.

CR predicates, hmmm i can see the value of being able to write to individual CR Fields that are then immediately used as predicate masks.

l.



More information about the Libre-soc-dev mailing list