[Libre-soc-bugs] [Bug 397] New: design and discuss user-tag flags in wishbone to provide phase 1 / 2 "speculative" memory accesses

Sun Jun 21 15:52:16 BST 2020

https://bugs.libre-soc.org/show_bug.cgi?id=397

            Bug ID: 397
           Summary: design and discuss user-tag flags in wishbone to
                    provide phase 1 / 2 "speculative" memory accesses
           Product: Libre-SOC's first SoC
           Version: unspecified
          Hardware: PC
                OS: Mac OS
            Status: CONFIRMED
          Severity: enhancement
          Priority: ---
         Component: Source Code
          Assignee: lkcl at lkcl.net
          Reporter: lkcl at lkcl.net
                CC: libre-soc-bugs at lists.libre-soc.org
   NLnet milestone: ---

(see https://bugs.libre-soc.org/show_bug.cgi?id=393#c17)

yehowshua whilst you are thinking how to answer the question (about how
to fulfil the requirement of providing a memory system - general open question
-
that connects to PortInterface) i thought it important to remind you that we
are doing a speculative-capable design.

in an earlier message i described the requirements, but did not receive a
response.
i will therefore reiterate them here as the top-level comment of this
bugreport.

the requirements are - and this is not optional - that memory requests be
subdivided into two phases:

1) checking whether the request *CAN* be completed - WITHOUT EXCEPTIONS - if
   it were to be permitted to proceed

2) allowing the memory request to proceed.

note: it is absolutely guaranteed that there will be more than one such request
at phase (1) outstanding in any given cycle.

if we do not have this two-phase system, where multiple Phase (1) requests
can be outstanding, we will be forced to fall back to *single* LOAD/STORE
operations.

and that means that performance will suck.

now.

given these requirements, can you see - can you understand and appreciate -
that
designing a "simple" wishbone-based system is guaranteed not to be useful?

this *by design* as part of the simple variants of the wishbone protocol.

this is because inherently it is AUTOMATICALLY assumed, by the wishbone
protocol,
that a write request - data plus address - will either:

a) complete atomically OR
b) fail atomically and
c) it is CATEGORICALLY IMPOSSIBLE to request anything else.

to reiterate:

there is *nowhere in the protocol* that allows us to communicate phase (1).

i.e. there is nowhere in the wishbone protocol that allows us to say "you need
to TELL us if this write request will either complete atomically or fail
atomically
***BUT WITHOUT ACTUALLY PERFORMING THE WRITE***"

therefore, spending time designing L1 caches - especially ones that use the
"simple" wishbone protocol - is *not* what we need.

now, i have been looking at the wishbone spec B4, page 51, illustration 3-11,
and it *might* be possible for us to add a master-side STALL_O signal
(as a TGA - tag-address bit) to achieve Phase (1) / (2) discernment:

(1) CLOCK CYCLE 1 - MASTER presents:
    - ADR_O = A0
    - TGA_O = valid and stall

(2) CLOCK CYCLE 2 - SLAVE presents either:
    - TGD_I - valid  OR
    - raises ERR_I if the address is invalid

once Phase (1) is complete (the 6600 engine knows that the memory request
is *GUARANTEED* to succeed) it can drop the "stall" TGA_O bit and the
memory request *MUST* then succeed.  if the SLAVE then raises ERR_I we
need to *halt* the processor.

the problem with this protocol is that it only supports one single "Phase (1)"
request.  the entire Wishbone Bus is dedicated and locked up, dealing with
that one request.

therefore whilst it illustrates the issue, it's impractical (i.e. useless).

i have a sneaking feeling that we are going to have to design something that
allows state information to be communicated:

1) MASTER presents:
    - ADR_O = A0
    - TGA_O = valid, and stall, *and* an "identifier" (the LDST unit ID)

2) SLAVE acknowledges (and stores the request in an internal buffer, but
   *also* beginning processing - determining the cascade through L1, L2, TLB
   and MMU and so on)

3) MASTER presents:
    - ADR_O = A1
    - TGA_O with a tag indicating **different** LDST unit

4) SLAVE acknowledges and buffers just as with (2)

5) MASTER presents:
    - ADR_O = A0
    - TGA_O = "request for updated status on progress of determining if Addr is
OK"

6) SLAVE acknowledges and confirms that the request with address A0 and
   for this LDST unit ID that *if* the MASTER were to present a request
   that the operation take place, it *would* succeed 100%.

7) MASTER presents:
    - ADR_O = A0
    - TGA_O = "request to proceed atomically"

8) SLAVE acknowledges

9) MASTER presents:
    - DAT_O = D0

10) SLAVE acknowledges with "data has been written".
   it also empties the buffer containing the ID.

in all it is quite a complex protocol, and i really cannot see how it can be
avoided,

the "Phase 1" parts involve knowledge of the addresses associated with the
peripherals: in the case of Virtual Memory this will be *dynamic* information.

the thing is that we even need this for mis-aligned as well as atomic
operations
and also for 128-bit atomic writes over 64-bit buses.

-- 
You are receiving this mail because:
You are on the CC list for the bug.