[Libre-soc-isa] [Bug 560] big-endian little-endian SV regfile layout idea

Thu Dec 31 06:27:00 GMT 2020

https://bugs.libre-soc.org/show_bug.cgi?id=560

--- Comment #30 from Jacob Lifshay <programmerjake at gmail.com> ---
(In reply to Luke Kenneth Casson Leighton from comment #28)
> (In reply to Jacob Lifshay from comment #23)
> > (In reply to Luke Kenneth Casson Leighton from comment #20)
> > > (In reply to Jacob Lifshay from comment #15)
> > > 
> > > > byte order *is* significant in registers precisely because we can treat them
> > > > as an indexed vector of bytes by using vector u8 instructions. having that
> > > > vector of bytes match the vector of bytes in memory is important for
> > > > performance and consistency, since otherwise we will have to insert tons of
> > > > byte-swap instructions for memory-order bitcasting that would otherwise be
> > > > totally unneeded.
> > > 
> > > no, you just use either ld-reverse or not.  the ld and st operation takes
> > > care of the bytereversing, that's why it was added to OpenPOWER.
> > 
> > no, you don't. bitcasting is a register reinterpret operation (register to
> > register), using load/store operations to implement bitcasting is slow and
> > wasteful (unless you needed to load/store anyway).
> 
> ah i was confused by the mention of "memory", i thought you were exclusively
> referring to memory-to-register trabsfers.
> 
> > on LLVM, bitcasting usually compiles to no instructions, or rarely a
> > register to register move instruction.
> 
> here, is there any reason why bitmanip would be insufficient?

bitmanip can be done, it's just more instructions that wouldn't be needed if
endian was consistent between registers/memory.
> 
> also: we need use-cases to justify the time drain spent doing a
> comprehensive evaluation.

yup, I know there are some, I'll look for concrete examples later when I'm not
braindead
> 
> 
> > > however if you override elwidth=32 then *even when VL=1* the top 32 bits
> > > WILL NOT be overwritten.
> > > 
> > > why?
> > > 
> > > because elwidth=32 is a SPECIFIC and direct command to the hardware to set
> > > the underlying regfile SRAM write-enable lines  to 0b00001111
> > 
> > Well, I interpret elwidth=32 as a specific command to mean we operate on
> > 32-bit values, so scalars are truncated/sign/zero-extended to/from 32-bits
> > when reading/writing registers.
> > 
> > scalar registers are a *totally different kind* of argument, they are *not
> > vectors*, 
> 
> they are: look at the pseudocode.  they're "degenerate vectors of length
> equal to one".
> 
> i think what you might be imagining to be the case is, "if VL==1 && SUBVL==1
> then SV is disabled, and a different codepath followed that goes exclusively
> to a scalar-only OpenPOWER v3.0B compliant codepath"
> 
> this categorically and fundamentally is NOT the case.

yup, totally agreed.

what I meant is that if you have a SVP64 instruction with scalar arguments:

add r10.v, r3.s, r20.v, subvl=1, elwidth=32, mask=r30

for r3 (but not r10 or r20) it reads the full register, independent of whatever
values VL and r30 have, and then truncates the read value to 32-bits then does
the adds.

add r3.s, r10.v, r20.v, subvl=1, elwidth=32, mask=r30

for r3 it writes the full register, independent of whatever values VL and r30
have (unless r30==0, then r3 is unmodified), sign/zero-extending the 32-bit sum
into the full 64-bit value that is written to r3.

this full register read/write is particularly important for f32 operations,
where the scalar representation is in full f64 format (because OpenPower's
weird):

fadds f10.v, f3.s, f20.v, subvl=1, elwidth=32, mask=r30

if f20 holds 0x3F800000 0x40000000 (1.0f32 2.0f32)
and f3 holds 0x3FF0000000000000 (1.0f64)
and VL == 2 and r30 == 0b11

then f10 will hold 0x40000000 0x40400000 (2.0f32 3.0f32)

basic summary: VL=1 is not special, mask with only 1 bit set is not special.
SUBVL=1 *and* reg set to scalar is special. SUBVL>1 and/or reg set to vector is
*not* special.

-- 
You are receiving this mail because:
You are on the CC list for the bug.