[Libre-soc-dev] twin predication and svp64

Fri Dec 11 21:18:39 GMT 2020

On Fri, Dec 11, 2020, 12:54 Luke Kenneth Casson Leighton <lkcl at lkcl.net>
wrote:

> briefly, as i think the same things are being said multiple ways, i
> git the mv.x vector thing finally, and the static memcpy.
>

yay!

however... with there being so much to do i advocate leaving it for a
> separate incremental change once we have a stable base.
>

yup

>
> > I always meant that the augumented FUs would respect dependencies,
> reading
> > from result latches of preceding in-flight ops if necessary, reg file
> > otherwise. Perhaps that wasn't sufficiently clear.
>
> appreciated.  i am having difficulty sustainig an 18+ month
> architectural map in my head and mixing that with alternative designs.
>
> > that's all well and good for data-dependent things like strcpy, however
> > memcpy *isn't* data-dependent so fail-on-first actually is unnecessary
> for
> > it
>
> it is.  the end-of-string is a red herring.  when the sizeof block is
> 1, 2, 4 there is still the possibility that any given VL=16 (say) may
> produce a suite of LDs that crosses a page boundary or hits an end of
> memory point.
>

nobody should be keeping anything at end-of-memory anyway (it's kernel
address space), so I'd consider that part a non-issue.

The avoiding crossing page boundaries is important, however it's just as
important even if the pages are already mapped causing the fail-first to
not stop at a page boundary. The other issue with fail-first is that it
lets you easily probe for unmapped pages, making exploits much easier.

maybe we should make a 3-argument setvl -- like normal setvl but where we
can give it an additional pointer and element size and it will pick the
best VL to align the pointer to whatever's most efficient. that will work
for more than just page alignment, it'll also work for cache-line alignment
and other internal alignment requirements.

>
> the page boundary crossover is considered unacceptably expensive, and
> the end of memory causes SIMD operations to catastrophically fail when
> they shouldn't even have been used.
>
> even for memcpy the 16x LDs @ 2byte may be chopped off by reducing VL
> to the point where the page fault doesn't occur.
>
> on the next loop the page fault *does* occur but it occurs on an
> entirely new page.
>
> i.e. by using fail-on-first the need to keep 2 pages in memory is
> gone, reducing VM working set maximum requirements.
>
> also, the ffirst happens to get VL aligned onto a page boundary, such
> that for really large memcpys *all* subsequent memcpy LD/STs will
> never hit a page fault.
>

that's not quite how that works -- if the page is not mapped/swapped out
then no matter how you try to access it, it will cause a page fault when
the memcpy gets to that point even if delayed by an iteration by ffirst.
every time you try to copy from/to an unmapped page, it has to fault and
the OS will map some pages then resume, or send a sigsegv. The OS generally
doesn't map everything at once since it has no idea how much you will
actually use and doesn't want to waste time/memory.

Jacob