I've been doing some research on hardware accelerated instructions,
and the understanding I've come to is that we are taking an
Application Specific Instruction Processor (ASIP)-like approach, but
unlike many ASIPs we are absolutely not going down the CISC, VLIW, or
so application specific that it compromises the performance of the cpu

It's an ASIP that is the CPU, it's just small extensions to the
existing CPU, some extra 'routines' that share the regular mul and
logical etc FUs, but also use new trig, pixel, and other special
circuits that massively speed up those operations that would take many
more cycles to 'emulate' in software, i.e. just doing it via ADDs and

I've seen a few papers where they take the approach of trying to reuse
hardware used by one application specific op, in others as well,
sharing as much of the circuitry as possible to minimize area and
waste. This is what we are doing right? That's my understanding from
reading the Nyuzi Raster paper, and paying attention to our list and
bug tracker. I feel like I'm still missing some crucial details
however, so if anyone sees what I'm missing or misunderstanding I
would appreciate your insight and explanation.


