[Libre-soc-sim] uspike ran 64-thread OpenMP matrix multiply!

Peter Hsu peter.hsu at bsc.es
Fri Aug 20 11:34:25 BST 2021


Pat me on the back :)

First time "thread-for-thread" RISC-V OpenMP program with 
OMP_NUM_THREADS=64.

Lots of futex() calls, LR/SC and AMO instructions.

-Peter

On 19/8/21 16:53, lkcl wrote:
>
> On August 19, 2021 4:40:39 AM UTC, Peter Hsu <peter.hsu at bsc.es> wrote:
>> Hi Luke, All,
>>
>> I migrated the fast interpretation code from the old Caveat to Uspike.
>>
>> Now by default uspike uses spike instruction semantics, but you can
>> override it with "fast" code:
>>
>>    "c.andi"     : { "fast":"wrd(r1 & immed)" },
>>    "c.subw"    : { "fast":"wrd(int32_t(r1) - int32_t(r2))" },
>>    "c.addw"    : { "fast":"wrd(int32_t(r1) + int32_t(r2))" },
>>    "c.j"           : { "fast":"wpc(pc+immed); break" },
>>
>> Spike laboriously extract register fields and construct immediate
>> values
>> every time, but fast code uses predecoded instructions.
> intriguing, what exactly is the difference? do you mean that the locations of bits is pre-calculated (at compile time) for fast?
>
> but that spike will execute at runtime some if/then/else or deep nested switch statements that access bit locations in a runtime dynamic fashion?
>
> in the LibreSOC / Microwatt Power ISA decoder in HDL it is a hybrid, you select a micro-op row with the information about which bits are needed for immediates, but those end up as static ranges.
>
> other fields are again dynamically selected based on Form but are otherwise statically computed.
>
> or, is there something else going on?
>
> i'm not entirely sure what you mean.
>
> l.



More information about the Libre-soc-sim mailing list