[Libre-soc-dev] snitch core

Mon Oct 25 10:43:02 BST 2021

On Sun, Oct 24, 2021 at 7:54 AM lkcl <luke.leighton at gmail.com> wrote:
> it *might* be a Vertical-First, particularly if it is possible to zero-overhead-loop automatically around multiple instructions.

Thanks for the link Jacob.  I read the article and there is a field in
the loop control setup instruction that specifies how many
instructions to loop over.  On page 6, "Figure 5. (a) Anatomy of the
proposed FREP instruction."  lists an immediate field "number of ins.
to repeat".

> feature 2: CISC-like / CDC6600/PDP11/68000-like "auto-load-and-increment"
>
> certain FP regs may be "tagged" to indicate, "if you *ACCESS* this register, actually what i want to happen is that you go off and read the register contents from memory (effectively) and by the way, auto-increment the load address to the next address as a side-effect"
>
> this is pretty much the definition of CISC "load-and-increment" instructions, and is no surprise given that they are part of efficient / elegant designs from 50 years ago.

I can speak to the elegance and efficiency of the 68000 instruction
set design and how pleasant it was to program in assembly:  flat
memory model, loop mode with "register-indirect with post-increment"
addressing mode, 8 general purpose 32-bit data registers, and 7
general purpose 32-bit address registers.  These were some of the
things that distinguished the architecture from x86, et al, and made
the 68k so much more pleasant to program.

> by making it a "tag" (exactly like how SVP64 REMAP works: that is also a hidden "tag" architecture) the underlying RISC architecture does not need changing.

Cool.

> the bit that is novel here is thus not the techniques, but how they demonstrated a hardware architecture that is power-efficient as a result.
>
> good for them.  like it a lot.

I also found their multiprocessor system architecture very
interesting.  The question of how to scale up efficiently--without
dedicating inordinate amounts of area and power to coordinating a
multiprocessor system.
Level  Description        Functional Units
0         Core Complex   Integer Core, Floating Point Sub-System, L0
instruction cache
1         Hive                   N Core Complexes, shared L1
instruction cache, shared integer mul/div unit
2         Cluster               M Hives, shared Tightly Coupled Data
Memory with atomic memory cycle execution units per memory bank(2
initiator ports/core, 2 memory banks/initiator port), shared cluster
peripherals via a crossbar (performance/contention counters,
architecture-level info, scratch registers, inter-processor interrupt
generation)
3         System               K Clusters, shared last-level memory
via a crossbar

Food for a lot of thought.

Richard