One important thing they mentioned is that swizzles should not be combined
with ALU ops. Combining them with load/stores is fine though.

I'm thinking that if we have the realignment network on the input of the
ALUs anyway to handle packing the ALUs fuller when doing predicated ops
(AVX512 doesn't do that), then swizzles might be fine to combine with ALU
ops anyway.

