[Libre-soc-dev] twin predication and svp64
Luke Kenneth Casson Leighton
lkcl at lkcl.net
Sat Dec 12 10:13:10 GMT 2020
On 12/11/20, Jacob Lifshay <programmerjake at gmail.com> wrote:
> On Fri, Dec 11, 2020, 12:54 Luke Kenneth Casson Leighton <lkcl at lkcl.net>
>> briefly, as i think the same things are being said multiple ways, i
>> git the mv.x vector thing finally, and the static memcpy.
> however... with there being so much to do i advocate leaving it for a
>> separate incremental change once we have a stable base.
>> > I always meant that the augumented FUs would respect dependencies,
>> > from result latches of preceding in-flight ops if necessary, reg file
>> > otherwise. Perhaps that wasn't sufficiently clear.
>> appreciated. i am having difficulty sustainig an 18+ month
>> architectural map in my head and mixing that with alternative designs.
>> > that's all well and good for data-dependent things like strcpy, however
>> > memcpy *isn't* data-dependent so fail-on-first actually is unnecessary
>> > it
>> it is. the end-of-string is a red herring. when the sizeof block is
>> 1, 2, 4 there is still the possibility that any given VL=16 (say) may
>> produce a suite of LDs that crosses a page boundary or hits an end of
>> memory point.
> nobody should be keeping anything at end-of-memory anyway (it's kernel
> address space),
end of program memory (VM) not end of physical memory. which can be
resized. initial size will be small. memcpy is highly likely to be
the trigger that causes a new page allocation to occur.
if that is in the middle of a LD sequence it is wasteful.
> The avoiding crossing page boundaries is important, however it's just as
> important even if the pages are already mapped causing the fail-first to
> not stop at a page boundary. The other issue with fail-first is that it
> lets you easily probe for unmapped pages, making exploits much easier.
i initially thought this may be the case. any LD allows the same
probing (just slower and one at a time) and i think it's actually ok.
> maybe we should make a 3-argument setvl -- like normal setvl but where we
> can give it an additional pointer and element size and it will pick the
> best VL to align the pointer to whatever's most efficient. that will work
> for more than just page alignment, it'll also work for cache-line alignment
> and other internal alignment requirements.
unfortunately the time for new ideas was 18 months ago when i was
analysing RVV and the SVE papers and adding ffirst.
the current context is now "how and if to fit the features in,
adapting them to the new format" rather than "what they are".
we don't have time, unfortunately. do make a note somewhere, it can
be discussed later if there turn out to be problems.
> that's not quite how that works -- if the page is not mapped/swapped out
> then no matter how you try to access it, it will cause a page fault when
> the memcpy gets to that point even if delayed by an iteration by ffirst.
yes i left out that if the very first LD/ST in the vector sequence has
an exception the truncation of VL is *not* done, the exception is
allowed to occur.
this is as if VL=1 and there was no ffirst.
the result is that it looks to all intents and purposes like a boring
slow single LD/ST except oh look! we are pleasantly surprised to find
some extra LDs got thrown in.
More information about the Libre-soc-dev