[Libre-soc-dev] remap status?
programmerjake at gmail.com
Fri May 28 15:12:31 BST 2021
On Fri, May 28, 2021, 07:04 Lauri Kasanen <cand at gmx.com> wrote:
> On Fri, 28 May 2021 14:14:50 +0100
> Luke Kenneth Casson Leighton <lkcl at lkcl.net> wrote:
> > the offset allowed is between 0 and 63. zdimsz ydimsz and xdimsz may be
> > between 1 and 64. do you need beyond that?
> Not sure. I need an offset of 64 floats, aka 256 bytes, between each
> > /mp3_0_apply_window_float.s is very unclear (arbitrary non-sequential
> > register allocations).
> Optimized gcc code ;)
> > could you outline in pseudo-assembler what you need?
> load 8 floats from arr, arr...
> load 8 floats from another, another...
Sounds like a good use-case for strided load (load base+0*stride,
base+1*stride, base+2*stride, ...). Strided load/store is very important
for GPU Shader performance, it is very common there. It can be implemented
less efficiently with gather-load.
More information about the Libre-soc-dev