[Libre-soc-dev] [OpenPOWER-HDL-Cores] XLEN in scalar spec pages

Sun Jun 20 23:58:07 BST 2021

Hi Luke,

I have been pondering on this for a while, hence the slow reply.

On Wed, Jun 09, 2021 at 02:11:11PM +0100, Luke Kenneth Casson Leighton wrote:
> Paul, Toshaan,
> 
> just while i think of it there is an additional reason why the
> pseudo-code of the spec should be altered to use a newly-defined
> variable, "XLEN":
> 
> the Scalar Fixed-Point Compliancy Subset.
> 
> that's the basic summary, it should be obvious from that alone, but
> for completeness / posterity i will write it out in full, below.
> 
> ---
> 
> somewhere in the looong history of Power ISA i imagine that the spec
> actually used to read, "all regs are 32 bit" and all pseudocode used
> to have RT[0:31] etc etc.

Well, the POWER1 and POWER2 implementations (early 1990s) were like
that, so yes.

> somewhere along the line that got "upgraded" to 64 bit with the likely
> assumption that nobody would be interested in going backwards in time,
> heck, 640k is enough for everybody.

When the "PowerPC" architecture was defined (1991 or so), it was a
64-bit architecture with a 32-bit subset, but for a long time many of
the implementations were 32-bit.

> 32 bit, i assume it was assumed, would be fine to run as "legacy"
> applications, and consequently the 64 bit results were "munged"
> (truncated) after computation.

I'm not sure what you mean by this.  A 32-bit implementation computes
a 32-bit result.  A 64-bit implementation computes a 64-bit result.
The 64-bit result on a 64-bit processor does NOT depend on whether the
processor is in 32-bit mode or 64-bit mode.  The only things that
are affected by 32-bit mode on a 64-bit processor are (a) effective
addresses are truncated to 32 bits, (b) the setting of CR0 for
dot-form instructions is based on the bottom 32 bits of the result,
and (c) the setting of XER[CA] and XER[OV] is based on the 32-bit
result.

> except... as has been seen with both Microwatt and LibreSOC, the
> resource utilisation of 64 bit on FPGAs is a whopping FIVE fold
> increase over the equivalent RV32IM (32 bit, integer and mul/div)
> resources.
> 
> this can be significantly reduced by halving the bitwidth of all ALUs,
> regfiles, and datapaths.
> 
> this means that the Lattice ECP5 45K is 50% full, where it could
> (might) be nearly 75% free.
> 
> the problem comes when a 32-bit implementor, with a *genuine
> compelling resource* reason to do only 32 bit, looks at the spec, and
> goes, "errr where's the 32 bit pseudocode"?

The architecture as it stands today does not have a 32-bit subset any
more.  That was removed at some point in the 2.x versions of the
architecture, after IBM, Freescale and others stopped making embedded
PowerPC chips.  As I read it today, registers have to be 64 bits wide.
(That does not necessarily mean that the implementation actually has
to have 64-bit datapaths, of course; it could be bit-serial for all
the architecture cares.)

> if the specification-compiler-simulator developed by LibreSOC can
> actually execute 32 bit unit tests at the actual 32 bit width (not: 64
> bit then truncate afterwards) then there is reasonable confidence that

With the Power ISA definition of 32-bit mode, there should be no
instances where the bottom 32 bits of the 64-bit result computed by a
64-bit implementation is different from the result computed by a
32-bit implementation, provided you only use "word" form instructions
(those that would be implemented on a 32-bit implementation), and with
a few other obvious exceptions such as darn.  If you can find a code
sequence where the results do differ, let me know.

> the spec pseudocode is correct.
> 
> one last thing: it could be assumed that it is okay for 32 bit SFS
> implementors to go back in time looking for very early Power ISA
> specs, to find out what to do: this is not an option because the EULA
> explicitly says "v3.0B" and the older specs will not even have the
> newer opcodes (or other subtle changes / corrections).

Right.

> deep breath, then: the spec pseudocode, 150 or so instructions, all
> need to be updated to use XLEN and say XLEN=32 or XLEN=64.
> 
> with the simulator we have written being able to check that, it is not
> as bad as it would otherwise be.
> 
> it would be necessary to start with the fixed point scalar because we
> haven't yet completed scalar floating point.  this is partly underway
> for Lauri to do MP3 SVP64 CODEC demo, but nowhere near robust and very
> few unit tests.

I think that reinstating a 32-bit subset would be desirable, but it
will be up to the ISA working group.  And yes it would result in
pervasive changes throughout the architecture spec.

Paul.