[Libre-soc-isa] [Bug 213] SimpleV Standard writeup needed

Fri Nov 20 17:41:15 GMT 2020

https://bugs.libre-soc.org/show_bug.cgi?id=213

--- Comment #107 from Luke Kenneth Casson Leighton <lkcl at lkcl.net> ---
(In reply to Jacob Lifshay from comment #106)
> (In reply to cand from comment #105)

> fp16 computations would be nice. While we're at it we could also add bf16
> which is much more suited to machine learning.

creating a non-uniform SV extension is... anomalous.  by that i mean that
applying SV Prefixes is either uniform and consistent or it is not called SV
Prefixing.  i.e. the prefixes and the whole SV concept applies uniformly as an
abstract independent for-loop with complete lack of knowledge of the
element-based instruction, or not at all.

therefore it applies to FP16 LD/STs *and* to FP16 computations... or not at
all.

(there are a very small number of circumstances where this is not true: vec4
normalisation and dotproduct are a couple of them.  i'm not comfortable with
this but we have to be pragmatic)

to do otherwise actually explicitly requires actual interering hardware at the
decoder level to get it to reject certain opcodes from being vectorised, making
a hard and critical dependence between two layers of decoding that simply
should not exist (which is why i am not happy about norm or dotproduct)

this is going to be enough of a nuisance as it is (certain opcodes simply
cannot be vectorised, such as twi, sc and so on).

*underneath* the actual ALU may go "er i don't have FP16 HW so i will do this
as FP32 then chuck away some bits" however that's down to individual
implementors to make that decision and yet that decision still has absolutely
nothing to do with SV Compliance.

regarding BF16, there is one free slot available in the 2bit encoding
"elwidth". yes hypothetically these may be:

* bf16
* fp16
* fp32
* default

which means that applying the elwidth override to 64 bit opcodes gives us "full
coverage" of available options, as long as an override can be applied to each
of src and dest.

however one thing not in our favour is that OpenPOWER is designed to waste the
full 64 bit reg on storing FP numbers as if they were always FP64.

the bits of an FP32 are *NOT* kept together in the LSBs/MSBs of a 64 bit reg:
the mantissa is *automatically* re-encoded to be placed in the mantissa FP64
bits and likewise the exponent.

this is rather inconvenient because instructions to convert between FP32 and
FP64 do not exist (because it's a nop), making storage of multiple FP32/FP16
vectors in an FP reg difficult to get out when we want to convert them to
FP32/FP64 vectors.

we may be forced to add conversion opcodes here.  needs thought.

-- 
You are receiving this mail because:
You are on the CC list for the bug.