[libre-riscv-dev] TLB

Sun Apr 21 19:48:57 BST 2019

https://github.com/westerndigitalcorporation/omnixtend/blob/master/implementations/fpga/vcu118/U540/SoC-block-diagram.png

A quick google search shows no known publicly available implementations,
reports of WD "partnering with SiFive", binary only blobs, and it is based
on "serialisation of TileLink".

TL has a known reputation for being a pig ti implement. Designed with OO
programming techniques in mind (chisel3) it is extremely difficult to work
with, where AXI4 and other similar standards were designed from the ground
up with simplicity, effectiveness and efficiency in mind.

If you google "TileLink verilog" there are no implementations that I could
find.

Whilst AXI4 and Wishbone themselves do not have cache coherency, looking at
the OpenPiton spec section 3 message API it is clear that the "usual" way
to do cache coherency is simply to define a suite of message format headers
and use the dual capability of eg AXI4 to separate control from data to
send each separately in order to avoid latency issues.

The mesage formats define the protocol and the bus infrastructure (eg AXI4)
does not need to know anything about the format, it is just a conduit.

I estimate we could easily spend maybe 4 to 5 months just implementing
TileLink alone, let alone OmniXtend, for which there is no information or
implementations. Using the rocket chisel source once converted to verilog
is not recommended, it is unreadable and we also end up with a hard
toolchain dependency on java.

And on SiFive, its designers, for any support questions.

By contrast, OpenPiton has full source, academic papers, full
documentation, the ariane team have collaborated with them already and have
booted RV Linux in single core mode reliably (SMP sometimes crashes, they
are investigating).

They have also published some unit tests specifically to handle NoC testing.

This kind of existing infrastructure is what inspires confidence that we
can use it without needing 1 to 2 extra man-years to complete the project.

We need to be really quite pathological and rational about this, focussing
on a core strategic part that has not been done before (the GPU VPU aspect)
and everything else, carefully and yet pathologically cherry pick what we
need from the best of what is out there.

The longer that something has been around, in the open community, the less
we have to do ourselves.

How much effort do you estimate it would take to implement OmniXtend and a
L2 cache on it, Jacob?

vs our own L2 cache and data/control protocol over e.g AXI4

L.

-- 
---
crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68