[libre-riscv-dev] buffered pipeline
programmerjake at gmail.com
Wed Mar 13 08:27:02 GMT 2019
note that in my pipeline stage design, succ_accepting to pred_accepting
doesn't go through a flip-flop so it isn't delayed a clock cycle, meaning
that a stage can block all predecessor stages in a single clock cycle,
eliminating the need to have extra stage registers.
I didn't include the table in the email, but I did check all combinations
of succ_accepting, pred_sending, and data_valid and it works just fine.
I'm assuming our pipelines aren't going to be shorter so that we won't need
to start worrying about the fan-in on the gates in the *_accepting path.
On Tue, Mar 12, 2019, 19:38 Luke Kenneth Casson Leighton <lkcl at lkcl.net>
> On Tue, Mar 12, 2019 at 3:11 PM Jacob Lifshay <programmerjake at gmail.com>
> > the strategy I'm planning on using for the simple barrel processor is
> > to have the pipeline never stop, if we encounter a reason an instruction
> > can't proceed in the current cycle, it is shunted into a delay pipeline
> > be retried the next time around.
> dan's post contains some other strategies that may help here. i will
> be implementing the IEEE754 FPU pipeline as a non-stoppable design
> (potentially adding detection to see if anything is in any stage, and
> stopping the whole pipe if it isn't), with a variation of the
> single-stage buffered pipe to take *multiple* inputs (multiple strobe
> lines) and multiplex a given input group to the output (along with its
> multiplexer ID).
> dan, this is probably extremely similar to wishbone or AXI N-to-1 bus
> that's what this is about:
> except... due to using john dawson's STB/ACK strategy, it can only
> handle one incoming set of operands every 2 clock cycles.
> my point is, jacob: to handle the delay-shunting you'll almost
> certainly need to deploy the exact same strategy (and hence could use
> exactly the code that i am writing).
> the requirements of a barrel processor (with a delay phase) are:
> * to have a round-robin test of whether an instruction shall be
> passed into the pipeline
> * to have no delays except if an instruction cannot proceed
> * if an instruction cannot proceed, it must not be lost (buffered)
> * all other instructions must continue unaffected
> * on detection of no longer being busy, the buffered instruction must
> rejoin the round-robin scheduling
> * it must be possible for MULTIPLE instructions to be busy (and buffered).
> so you need an *array* of instruction store/delay buffers, an *array*
> of STB and BUSY lines to look after them, where unstalled instructions
> are to be multiplexed to a single output of data, STB, and BUSY.
> that's *exactly* what i am working on, right now.
> the code that i'm writing specifically meets these very precise
> requirements, with the exception that i am using a priority encoder
> instead of a round-robin selection strategy.
> > For stallable pipelines, I think we should name the pipeline control
> > signals pred_sending, succ_sending, pred_accepting and succ_accepting.
> funnily enough i added prefix letters as the first thing when writing
> the first unit test, i named them i_p_stb, o_n_stb, o_p_busy and
> i_n_busy, and wrote this ascii art which is now in the docstring:
> stage-1 i_p_stb >>in stage o_n_stb out>> stage+1
> stage-1 o_p_busy <<out stage i_n_busy <<in stage+1
> stage-1 i_data >>in stage o_data out>> stage+1
> | |
> +-------> process
> | |
> +-- r_data ---+
> the shortened names need a seconds' thought, however i believe
> they're clear, and, crucially, do not result in line-wrap to use them.
> also, "STB" for "Strobe" is a standard hardware convention
> synchronously indicating "data ready right now".
> > A simple example stage:
> > module stage(clk, rst, pred_sending, pred_accepting, pred_data,
> > succ_sending, succ_accepting, succ_data);
> > input clk;
> > input rst;
> > input pred_sending;
> > output pred_accepting;
> > input [63:0] pred_data;
> > output succ_sending;
> > input succ_accepting;
> > output [63:0] succ_data;
> > reg data_valid;
> > reg [63:0] data;
> > wire next_data_valid;
> > assign succ_sending = data_valid;
> > assign pred_accepting = ~data_valid | succ_accepting;
> > assign next_data_valid = pred_sending | (~succ_accepting &
> > assign succ_data = data + 1; // stage operation
> > initial data_valid = 0;
> > initial data = 0;
> > always @(posedge clk or posedge rst) begin
> > if(rst) begin
> > data_valid <= 0;
> > data <= 0;
> > end
> > else begin
> > data_valid <= next_data_valid;
> > data <= pred_data;
> > end
> > end
> > endmodule
> from what i understand, data will be lost, here, under certain
> conditions. or, it will be sub-optimal (result in unnecessary delays).
> i'm not skilled enough in logic analysis to identify which.
> dan's original post makes it clear that there are 4 cases involved
> (it's not quite as straightforward as it first appears). there's a
> situation where the input has valid data (and the next stage is busy
> so a stall must happen), yet because this is a;; based on clocks,
> there's not yet been an opportunity to *tell* the input "please stop
> so due to that one-clock delay where you are *going* to tell the
> input "please stop sending", you absolutely must buffer the input
> data, otherwise it's irrevocably lost. at the same time, you tell the
> input that on the next clock, "please stop sending".
> now, when the next stage is no longer busy, the processing must
> "flip" to process the *stored* data, *not* the incoming data. the
> stage's attention is therefore effectively multiplexed between the
> input and the buffer.
> in other words it's quite a complex state machine, for such a
> seemingly-innocuously-simple set of requirements.
> libre-riscv-dev mailing list
> libre-riscv-dev at lists.libre-riscv.org
More information about the libre-riscv-dev