[Libre-soc-dev] Hardware-accelerated specialized instructions

Thu Dec 10 22:52:28 GMT 2020

Hi Luke,

My understanding of our approach is that we are designing opcodes that do
one of two things.

1. perform an operation that can be done using existing instructions, but
using a single instruction instead of possibly dozens eliminates
significant instruction count overhead. This is like trapping on and
emulating future unimplemented opcodes. It uses only existing FUs.

2. perform an operation that can be approximated with existing
instructions, but using a single instruction that performs the operation or
the key part (kernel) of an operation as its own circuit/FU. If there is
part of the operation that can be done as efficiently in existing FUs like
ADDs, then this is done but without the overhead of additional
instructions. This has the benefits of (1) eliminating
instruction/emulation overhead, but is much faster because things like
CORDIC or Trig are much slower when using ADD and MUL FUs than using a
specific Sine circuit, for example.

In the case of (1) and (2), the instruction may share FUs that have already
been allocated for existing instructions, but may in some cases require
allocating one or more of each of the 'standard' FUs that it uses outside
of its 'kernel'. This is where the simulation and analysis of resource
contention from simulating real-world use cases like H.264 or Vorbis
encoding/decoding is essential.

Please correct any misunderstandings I have in the above. I feel like I
have the gist of it, but am misunderstanding or still missing some parts.

Cole