[libre-riscv-dev] buffered pipeline

Wed Mar 13 01:36:49 GMT 2019

(cc'ing dan gisselquist, hi dan you don't strictly need to subscribe, i
will add you to accept-post-filters)

context: dan kindly responded to a private enquiry, his repy was
comprehensive so i realised it needed to go to the list.

---------- Forwarded message ---------
From: Dan <dan.gisselquist at gmail.com>
Date: Tue, Mar 12, 2019 at 8:39 PM
Subject: Re: pipeline strategies post
To: Luke Kenneth Casson Leighton <lkcl at lkcl.net>

Luke,

You are welcome to post this e-mail, or any of the e-mail comments I've
made below, wherever and as you wish.  Please do me the favor of offering
me a link, though, should you do so.

See comments within.

I've also updated the pipeline strategies article, given our determination
that the two lines in question were unreachable in the first place.

Dan

On Tue, 2019-03-12 at 15:38 +0000, Luke Kenneth Casson Leighton wrote:

On Tue, Mar 12, 2019 at 1:42 PM Dan <dan.gisselquist at gmail.com> wrote:

Hi, Luke!

 allo :)  btw it's implemented
herehttps://git.libre-riscv.org/?p=ieee754fpu.git;a=blob;f=src/add/example_buf_pipe.py;h=b72e1c43904451ba0ef7f9fa78d5417da8de0a8d;hb=0e70fec7c3df1ee97020aa5be6f358c85898a5fb

 we picked nmigen for the libre riscv soc.  can i ask a favour?  we're
developing the processor under the requirement to be entirely
transparent (to restore trust in computing devices) - could we have
this conversation on the libre-riscv-dev mailing list?

Sure!  Should I sign up?

Looking over what you've pasted, I can see your confusion.  Yes, some
logic needs to set r_data.  Some other logic also needs to set
o_data.  If you look about four lines further within that post,
though, you'll see where r_data gets set--that's much simpler to set.

 yes... except... there's three assignments of logic(i_data) to
something (o_data or r_data) which means three duplicated code-blocks
in hardware.. i'll fix that by assigning on a combinatorial
intermediate.

Sounds good!  Indeed, that's probably what I would do.

 r_data can basically be set *ANY TIME* the downstream channel is
idle.  You'll also notice the comment that the code above hasn't been
tested--it's sort of showing the general idea.

 indeed.  it's the best explanation we've been able to find.

You can find another discussion of this approach here as well:
http://fpgacpu.ca/fpga/skid_buffer.html

If you want to look at some actual tested code implementing this
algorithm, then I have a couple items I could share.

 appreciated.  can i pass these on to the team (on the public mailing list)?

Sure!  All of the code below is available publicly on github.

- This one implements a bus delay, to keep combinatorial logic from
building up too far: https://github.com/ZipCPU/zipcpu/blob/master/rtl
/ex/busdelay.v  Make sure you check out the path where DELAY_STALL is
non-zero.  This makes a great example, because nothing is being done
to the data other than delaying it.

- Here's another example that handles expanding the width of the bus
at the same time.  I used it to get access to a 128-bit memory data-
bus from my normal 32-bit WB implementation: https://github.com/ZipCP
U/videozip/blob/master/rtl/busexpander.v

- I'm also slowly working on a cross bar switch.  You can see one of
my implementations here: https://gist.github.com/ZipCPU/f0268c7906de6
84cf7d4ab77345a413a

  This switch also uses the same buffered handshake.  (Others call it
a skid buffer.)

 interesting - the primary reason we need this buffered pipeline is
for an out-of-order processor, implementing reservation stations on a
CDC 6600-like design.  the reservation stations are effectively a
multiplexer of an array of inputs through a pipeline, onto an array of
outputs on the other side.

we won't be doing NxN multiplexing though :)

- Finally, you can read about how I applied this technique to the
AXI-lite bus here: http://zipcpu.com/blog/2019/01/12/demoaxilite.html

All of these examples are from known "operational" (i.e. running)
code, that have been tested on FPGA's and are known to work.

Now, returning to your question ... as I look over the code in the
article, it still looks right to me.  Be aware that the post shows a
rather large and complex always block split over many code blocks
within the post.  That block sets o_data throughout, so it would be
appropriate to set o_data within it during all of the various logic
paths.  In the path you point out, it follows from an

if (i_reset)
else if (i_busy)
else if (!o_stb)
else if ((i_stb)&&(!o_busy))
begin
  // ....
  if (!o_stb)
     ...
end

Yeah, you are right, that's not quite right.  o_stb is already known
to be zero, so this assignment to o_data can be removed.

As for r_data, since that's set in the next block, let's take a peek
at that one.  That says that, if the output isn't busy, then
r_data gets set to the input.  One of the subtleties I remember
learning about this is that if the output is busy, then r_stb is
high.  (The two pieces of logic can be tied together combinationally-
-there's no difference between them really.)  So if there's never
anything waiting in the buffer, then there's no problems with
accepting a new transaction.  Perhaps it might've been more
appropriate to say if (i_stb)&&(!o_busy), but checking for (!o_busy)
alone might keep you from needing to use 32-LUTs (or whatever
r_data's width is) to represent the value.

Dan

On Tue, 2019-03-12 at 12:49 +0000, Luke Kenneth Casson Leighton
wrote:

https://zipcpu.com/blog/2017/08/14/strategies-for-pipelining.html

hi dan,

may have spotted a bug in the above:

// Always block continued ... (i_reset) is false, (i_busy) and
(o_stb) are both
// true.
else if ((i_stb)&&(!o_busy))
begin
// If the next stage *is* busy, though, and we haven't
// stalled yet, then we need to accept the requested value
// from the input.  We'll place it into a termporary
// location.
r_stb  <= (i_stb)&&(o_stb);
o_busy <= (i_stb)&&(o_stb);
if (!o_stb)
o_data <= i_data;
end
end

according to the comment, i believe that should be r_data <= i_data ?

l.