[Libre-soc-sim] uspike ran 64-thread OpenMP matrix multiply!
Pete Wilson
peter.wilson at bsc.es
Fri Aug 20 15:03:52 BST 2021
Bloody eck!!
Eh, not bad…
P
Sent from my iPhone
> On Aug 20, 2021, at 12:34 PM, Peter Hsu <Peter.hsu at bsc.es> wrote:
>
> Pat me on the back :)
>
> First time "thread-for-thread" RISC-V OpenMP program with OMP_NUM_THREADS=64.
>
> Lots of futex() calls, LR/SC and AMO instructions.
>
> -Peter
>
>> On 19/8/21 16:53, lkcl wrote:
>>
>>> On August 19, 2021 4:40:39 AM UTC, Peter Hsu <peter.hsu at bsc.es> wrote:
>>> Hi Luke, All,
>>>
>>> I migrated the fast interpretation code from the old Caveat to Uspike.
>>>
>>> Now by default uspike uses spike instruction semantics, but you can
>>> override it with "fast" code:
>>>
>>> "c.andi" : { "fast":"wrd(r1 & immed)" },
>>> "c.subw" : { "fast":"wrd(int32_t(r1) - int32_t(r2))" },
>>> "c.addw" : { "fast":"wrd(int32_t(r1) + int32_t(r2))" },
>>> "c.j" : { "fast":"wpc(pc+immed); break" },
>>>
>>> Spike laboriously extract register fields and construct immediate
>>> values
>>> every time, but fast code uses predecoded instructions.
>> intriguing, what exactly is the difference? do you mean that the locations of bits is pre-calculated (at compile time) for fast?
>>
>> but that spike will execute at runtime some if/then/else or deep nested switch statements that access bit locations in a runtime dynamic fashion?
>>
>> in the LibreSOC / Microwatt Power ISA decoder in HDL it is a hybrid, you select a micro-op row with the information about which bits are needed for immediates, but those end up as static ranges.
>>
>> other fields are again dynamically selected based on Form but are otherwise statically computed.
>>
>> or, is there something else going on?
>>
>> i'm not entirely sure what you mean.
>>
>> l.
More information about the Libre-soc-sim
mailing list