[Libre-soc-dev] remap status?

Jacob Lifshay programmerjake at gmail.com
Fri May 28 15:12:31 BST 2021


On Fri, May 28, 2021, 07:04 Lauri Kasanen <cand at gmx.com> wrote:

> On Fri, 28 May 2021 14:14:50 +0100
> Luke Kenneth Casson Leighton <lkcl at lkcl.net> wrote:
>
> > the offset allowed is between 0 and 63.  zdimsz ydimsz and xdimsz may be
> > between 1 and 64.  do you need beyond that?
>
> Not sure. I need an offset of 64 floats, aka 256 bytes, between each
> load.
>
> > /mp3_0_apply_window_float.s is very unclear (arbitrary non-sequential
> > register allocations).
>
> Optimized gcc code ;)
>
> > could you outline in pseudo-assembler what you need?
>
> load 8 floats from arr[0], arr[64]...
> load 8 floats from another[0], another[64]...
>

Sounds like a good use-case for strided load (load base+0*stride,
base+1*stride, base+2*stride, ...). Strided load/store is very important
for GPU Shader performance, it is very common there. It can be implemented
less efficiently with gather-load.

Jacob


More information about the Libre-soc-dev mailing list