[Libre-soc-isa] [Bug 1056] questions and feedback (v2) on OPF RFC ls010

Thu Jun 1 02:12:01 BST 2023

https://bugs.libre-soc.org/show_bug.cgi?id=1056

--- Comment #44 from Paul Mackerras <paulus at ozlabs.org> ---
(In reply to Luke Kenneth Casson Leighton from comment #35)
> (In reply to Paul Mackerras from comment #30)
> 
> > I think you mean sv.addi/elwidth=16 5,5,0x1122 (not 5,_0_,0x1122).
> 
> ah! yes
> 
> > I'll assume the 0 for RA is a typo caused by 3.27AM.
> > 
> > > * then inspect (verilator) GPR(5) and read its contents
> > > 
> > > is the answer you expect, regardless of LE/BE: 0x2356?
> > > or would it be 
> > > * 0x2211_0000_0000_1234 (or 0x1122_0000_0000_1234) *or*
> > > * 0x0000_0000_0000_3456 due to addi being implicitly
> > >   reversed-byte-order from sv.addi under BE?
> > 
> > I would expect 0x1122_0000_0000_1234 in BE mode, since you have operated on
> > element 0 and elements are 16 bits wide.
> 
> ahhh now *that* makes it clear.  and is so far left-field of what i
> was modelling/expecting from the combinatorial explosion of possibilities
> that i couldn't possible guess it :)
> 
> now, here's the thing (walk through the implications).  where the LE
> element-access would be this:
> 
>      # assume everything LE-ordered and LSB-numbered
>      gpr_width = 8 # bytrs
>      num_gprs = 128 # in "upper" SV Compliancy Levels
>      GPR_sram = [0x00] * gpr_width * num_gprs
>      src_elbytes = src_elwidth // 8
>      for i in range(VL):
>          bytenum = i * src_elbytes # element offset in SRAM bytes
>          ra_element_start = RA*gpr_width     # vector start position
>          ra_element_start += bytenum # element offset
>          ra_element_end   = ra_element_end + (src_elbytes-1)
>          ra_src_operand = GPR_sram[ra_element_start thru ra_element_end]
> 
> a BE-reversal of the underlying SRAM-access would be:
> 
>      # *still* assume everything LE-ordered and LSB-numbered
>      gpr_width = 8 # bytrs
>      num_gprs = 128 # in "upper" SV Compliancy Levels
>      GPR_sram = [0x00] * gpr_width * num_gprs
>      src_elbytes = src_elwidth // 8
>      for i in range(VL):
>          offset = i * src_elbytes           # element offset in SRAM bytes
>          gpr_num = offset // gpr_width      # relative GPR number  
>          bytenum = offset %  gpr_width      # byte-start in GPR
> ---->    bytenum = ~bytenum & 0b1111_1111   # BE-inversion

No, this isn't right.  It should be

         bytenum = bytenum ^ (8 - src_elbytes)

>          # now finally we know the element-offset start pos
>          ra_element_start = (gpr_num * gpr_width) + bytenum
>          ra_element_start += RA*gpr_width     # add vector start position
>          ra_element_end   = ra_element_end + (src_elbytes-1)         
>          ra_src_operand = GPR_sram[ra_element_start thru ra_element_end]
> 
> 
> at which point i think you'd agree that trying to explain that to
> programmers, that this is the underlying model, would be a bit much :)
> 
> 
> > > now the same thing with *scalar* instructions:
> > > 
> > > * let us set (verilator or "addi 5,0,0x1234") the contents of GPR(5) = 0x1234
> > > * perform "addi 5,0,0x1122"
> > > * then inspect (verilator) GPR(5) and read its contents
> > > 
> > > is it *still* 0x23567 regardless of LE/BE?
> > 
> > It's 0x2356 regardless of LE/BE.
> 
> and that discrepancy is a violation of (one of the) Orthogonality rule(s).
> when MAXVL=VL=1 the behaviour *has* to be the same (elwidth
> notwithstanding)

The behaviour clearly does depend on elwidth (even in LE mode), because the
scalar instruction writes all 64 bits of the register but the vectorized
instruction with VL=1 only writes elwidth bits.

-- 
You are receiving this mail because:
You are on the CC list for the bug.