[Libre-soc-sim] uspike ran 64-thread OpenMP matrix multiply!

Pete Wilson peter.wilson at bsc.es
Fri Aug 20 15:03:52 BST 2021


Bloody eck!!

Eh, not bad…

P

Sent from my iPhone

> On Aug 20, 2021, at 12:34 PM, Peter Hsu <Peter.hsu at bsc.es> wrote:
> 
> Pat me on the back :)
> 
> First time "thread-for-thread" RISC-V OpenMP program with OMP_NUM_THREADS=64.
> 
> Lots of futex() calls, LR/SC and AMO instructions.
> 
> -Peter
> 
>> On 19/8/21 16:53, lkcl wrote:
>> 
>>> On August 19, 2021 4:40:39 AM UTC, Peter Hsu <peter.hsu at bsc.es> wrote:
>>> Hi Luke, All,
>>> 
>>> I migrated the fast interpretation code from the old Caveat to Uspike.
>>> 
>>> Now by default uspike uses spike instruction semantics, but you can
>>> override it with "fast" code:
>>> 
>>>   "c.andi"     : { "fast":"wrd(r1 & immed)" },
>>>   "c.subw"    : { "fast":"wrd(int32_t(r1) - int32_t(r2))" },
>>>   "c.addw"    : { "fast":"wrd(int32_t(r1) + int32_t(r2))" },
>>>   "c.j"           : { "fast":"wpc(pc+immed); break" },
>>> 
>>> Spike laboriously extract register fields and construct immediate
>>> values
>>> every time, but fast code uses predecoded instructions.
>> intriguing, what exactly is the difference? do you mean that the locations of bits is pre-calculated (at compile time) for fast?
>> 
>> but that spike will execute at runtime some if/then/else or deep nested switch statements that access bit locations in a runtime dynamic fashion?
>> 
>> in the LibreSOC / Microwatt Power ISA decoder in HDL it is a hybrid, you select a micro-op row with the information about which bits are needed for immediates, but those end up as static ranges.
>> 
>> other fields are again dynamically selected based on Form but are otherwise statically computed.
>> 
>> or, is there something else going on?
>> 
>> i'm not entirely sure what you mean.
>> 
>> l.




More information about the Libre-soc-sim mailing list