[Libre-soc-dev] scalar instructions and SVP64

Wed Mar 10 20:47:10 GMT 2021

On Wed, Mar 10, 2021, 11:53 Luke Kenneth Casson Leighton <lkcl at lkcl.net>
wrote:

> On Wednesday, March 10, 2021, Jacob Lifshay <programmerjake at gmail.com>
> wrote:
>
> >
> > I'd expect the additional gate cost to be on the order of 10 gates in the
> > decoder and 20-30 gates in the VL-loop stage (which I'm assuming comes
> > after the decoder).
>
>
> no!  please see the test issuer source code and discussions which have been
> ongoing for several weeks now.
>

yes...that can be changed relatively easily since you can/have-to-anyway
decode instructions before incrementing register numbers -- the decoder
stage can go before the VL-loop FSM.

>
> the instruction decoder is *after* the VL identification because VL is part
> of the same state including SVSTATE MSR and PC.
>

VL and SVSTATE don't affect how the instruction decodes...therefore the
instruction can be decoded (at least enough to determine SV/scheduling
properties -- the rest can be done in the execution pipelines like usual)
before the SV loop FSM, the SV loop FSM only needs to increment register
numbers and count up to VL (only for vector ops). The SV loop FSM becomes a
no-op pass-through for scalar instructions, SVP64 or not.

>
> the instruction is *not* decoded - at all - at the point where SVP64 has
> been identified.
>
> PowerDecoder2 is 4,000+ gates and is so big it has to be divided into 2
> separate pipeline stages (12 individual satellite decoders one per
> pipeline)
>
> that 4,000 gates is a massive long mux cascade where it is COMPLETELY
> unacceptable to make anything critically depend on it.
>
> and everything that you suggest due to this fundamental misunderstanding of
> what SV is categorically requires exactly that.
>

Well, I'm saying that SV should be changed if we want efficiency, rather
than piling more and more kludges on the ISA to force scalar ops to fit in
the vector-shaped hole that you seem to have conflated with SVP64.

Also, take another look at the old SVP48/SVP64 for RISC-V, pay attention
and note that it specifically includes scalar ops for all prefixed
instructions:
https://libre-soc.org/simple_v_extension/sv_prefix_proposal/
the scalar ops are when all vs#/vd fields are set to scalar.

An alternative option that achieves the same end goal without needing to
move the decoder is to use the scalar/vector-bit for the first/dest reg
(which is always in the same spot -- instruction forms without a dest reg
can have their SVP64 register fields moved one reg field over to make
space) as a whole-instruction scalar/vector-bit, the operations that that
removes (those with scalar dest but vector arguments -- which are not
common instructions) can be effectively substituted with scalar mv.x.
Since the bit is always in the same spot and all instructions have that
bit, decoding it from the SVP64 prefix then becomes utterly trivial.
This also simplifies the logic for the SV loop FSM since it no longer needs
to implement the write-once-then-finish logic which I expect to be quite
complex.

Jacob