[Libre-soc-bugs] [Bug 230] Video opcode development and discussion

bugzilla-daemon at libre-soc.org bugzilla-daemon at libre-soc.org
Mon Dec 14 17:35:54 GMT 2020


https://bugs.libre-soc.org/show_bug.cgi?id=230

--- Comment #41 from Luke Kenneth Casson Leighton <lkcl at lkcl.net> ---
(In reply to cand from comment #39)
> Is the formulation in comment #27 not enough? 

reminder:

    rd = (rs >> 0 * 8) & (2^8 - 1)
    rd+1 = (rs >> 1 * 8) & (2^8 - 1)
    rd+2 = (rs >> 2 * 8) & (2^8 - 1)
    rd+3 = (rs >> 3 * 8) & (2^8 - 1)

answer: not entirely, i was thinking along the lines of a
shift-immediate-with-mask that could be specified anywhere from say 4 to 15
bits, or perhaps even encoded in fax machine format (huffman encoding) to say
which bits are extracted, which are skipped.

example: from LBS0 of input:

* skip N bits select M bits place in dest1
* skip O bits select P bits place in dest2
* ....

is this practical, useful, and general-purpose? i don't know.


> To have it in BE, the mask
> needs to be shifted to the other end, in addition to adjusting the shift.
> 
> The example from 3.0b page 266 has 128 bits of 7-bit data, and their vslv
> has this weird byte-short focus. It's unlikely to find such a stream of
> data, but to unpack that in our scheme I would pre-process it to vec4 units,
> so that one 32bit reg has 4*7=28 bits, suitably aligned.
> 
> There's basically two types of variable bit width data: pixel formats like
> 5-6-5 and compressed data ala Huffman. The former is rare, and the latter
> would need sequential parsing to know their lengths, aka no point in
> vectorizing that.

well that's the beauty of having a large shift register to work on, within the
FSM.  we *could* conceivably do huffman encoding or other achemes, on input
from up to 12 (!) 64 bit registers.

bear in mind, it's going to be a BIG shift register, minimum 4x 64 bits
possibly longer, because it needs to be able to take in inputs from up to 4x 64
bit registers at the same time.  it also needs to output a minimum of 4x 64 bit
operands per clock as well.

as long as we do not go completely mad, and also let the processing be "local"
it should be fine.  by "local" i mean that any kind of peocessing be limited to
a 64 bit "window" onto the 256+ bit shift register.  if that "local" processing
needs to get 128 bits from the SR it *must* wait for another clock cycle as the
data shifts along to it.

the alternative would be to allow the "local" processing to reach out, with
MUXes, to take in 128 bits or above and if there are 4 of those local 64bit
units that's way too many gates.

hm just looking at this
https://en.m.wikipedia.org/wiki/Huffman_coding

it requires a lookup table.  hmm reluctant to include that. 

> So I don't think variable shift/mask amounts per-element
> are necessary (but do point out if there's some use I missed).

i honestly don't know, i'm throwing ideas out at the moment and floundering a
bit :)

-- 
You are receiving this mail because:
You are on the CC list for the bug.


More information about the libre-soc-bugs mailing list