[Libre-soc-dev] GPR-to-FPR and FPR-to-GPR move operations

Luke Kenneth Casson Leighton lkcl at lkcl.net
Sat May 29 10:04:58 BST 2021


Lauri is kindly investigating MP3 in SVP64 assembler and it's turning out to
be a good test of what opcodes are needed.  in the bi-weekly meeting last
week, Paul, we mentioned briefly the need for GPR-to-FPR and FPR-to-GPR
mv operations (straight bit-wise) given that VSX/SIMD will not be added to
Libre-SOC as a GPU / VPU.

Jeff Bush's Nyuzi paper makes it clear that the cost of transferring
workloads through L1/L2 cache is hugely expensive, and describes the efforts
he went to to reduce power consumption

additionally, Lauri points out that just to get zero into an FPR is also
costly: it requires a LD operation which takes up data segment space
and unnecessarily activates both memory as well as L2 and L1 data
cache paths when compared to a MV-from-GPR operation.

in addition to that, in an Out-of-Order system the cycle latency of the
path through L1 cache will be much higher than a straight MV operation
(which in some micro-architectures may be a macro-op-fused operation).

* this in turn requires a larger number of "in-flight" operations
* this in turn increases the number of Reservation Stations
* this in turn increases O(N^2) the size of Dependency Matrices

the impact therefore of using the LD-ST path is extremely costly: all
of which points to a straight bit-copy between GPR and FPR being

in some micro-architectures the MV may end up being a macro-op
fused operation: it may end up actually being removed entirely from
the pipelines, instead being used to mark the source or destination
of INT or FP operations as targetting the *other* regfile:

     fmv2int  fp5, r3
     addi r3, 0x5

becomes (macro-fused):

     addi fp5, 0x5

it should be clear that when adding bitmanip operations as well, the
possibilities expand to be able to perform bitmanipulation on FPRs.


More information about the Libre-soc-dev mailing list