[Libre-soc-dev] [RFC] SVP64 Data-dependent fail-first on CR operations (crand, cror, etc)
lkcl
luke.leighton at gmail.com
Sat Aug 28 17:36:02 BST 2021
https://libre-soc.org/openpower/sv/svp64/appendix/
just one of those things, i realised we have not thought through yet the full implications of combining Data-Dependent Fail-First with crand, crxor, etc.
the idea of DDFF is that a CR bit test (like BO for branches) is given, "is CRField[idx] == BO[1]" and if that test fails it TERMINATES the current instruction Vector Loop and truncates VL to that point.
subsequent Horizontal instructions will then only run at the truncated VL loop size.
usually this woule be used with Rc=1. example: you do a subtract (sv.subf.) and if any one of those subtracts is less than zero the loop is terminated at that point, VL set to a length that excludes that failed element.
here's the thing: some instructions like crand do not then also have an "Rc=1" option, but they still produce CR field modifications that would be useful to test.
example: two sv.cmp operations are carried out, with different numbers, a and b as scalar limits. a sv.crand is performed, you want the loop to terminate at the point where:
* first LE comparison against scalar A failed OR
* second GE comparison against scalar B failer.
a FFirst crand of the A LE bits with the B GE bits would achieve this effect.
we would think in this case it would be necessary to use the 3 bits (inv-test, index, just like in v3.0B Branches) from the 24-bit RM Mode field
howeverrrrr.. let us look more carefully at crand (etc)
crand etc actually pass in 5 bit arguments, for a full 32-bit in each case.
* BA selects a CR Field 0-7 and selects which bit EQ LE GE SO to use
* likewise BB
* likewise BC
thus we *have* the bit to select for the FFirst testing *already*, from BC.
thus, we can use the *other* type of Ffirst mode, called RC1 mode.
this mode is normally reserved for operations that do not have an Rc=0/1 option, but there is also two other bits:
* inv (to test if our bit selected by BC is 1/0)
* VLi which is "VL Inclusive" mode
VLi mode will, if VL is truncated, *include* the current element (the one whose CR bit test failed) in the count that goes into VL.
this is extremely useful for things like strncpy where you want to include the terminating zero in a copy operation.
other operations which *actually* operate on entire (complete) CR Fields are usually only 3 bits for specifying the CR Field. these *would* need the index mode.
it is quite... a lot of analysis.
l.
More information about the Libre-soc-dev
mailing list