[Libre-soc-bugs] [Bug 770] Discussion and Finalisation of Which Cryptographic Primitives to Implement

bugzilla-daemon at libre-soc.org bugzilla-daemon at libre-soc.org
Sat Oct 15 11:27:48 BST 2022


https://bugs.libre-soc.org/show_bug.cgi?id=770

--- Comment #7 from Luke Kenneth Casson Leighton <lkcl at lkcl.net> ---
correction: i just found something interesting / intriguing:

  https://www.oryx-embedded.com/doc/chacha_8c_source.html

    //ChaCha runs 8, 12 or 20 rounds, alternating between column rounds and
    //diagonal rounds
    for(i = 0; i < context->nr; i += 2)
    {
       //The column rounds apply the quarter-round function to the four
       //columns, from left to right
       CHACHA_QUARTER_ROUND(w[0], w[4], w[8], w[12]);
       CHACHA_QUARTER_ROUND(w[1], w[5], w[9], w[13]);
       CHACHA_QUARTER_ROUND(w[2], w[6], w[10], w[14]);
       CHACHA_QUARTER_ROUND(w[3], w[7], w[11], w[15]);

       //The diagonal rounds apply the quarter-round function to the top-left,
       //bottom-right diagonal, followed by the pattern shifted one place to
       //the right, for three more quarter-rounds
       CHACHA_QUARTER_ROUND(w[0], w[5], w[10], w[15]);
       CHACHA_QUARTER_ROUND(w[1], w[6], w[11], w[12]);
       CHACHA_QUARTER_ROUND(w[2], w[7], w[8], w[13]);
       CHACHA_QUARTER_ROUND(w[3], w[4], w[9], w[14]);
    }

that's a rotated matrix multiply that *might* fit with a combination
of Matrix REMAP and Indexed REMAP.  it would almost certainly have to
be Vertical-First Mode as there is a dependency chain on the QUARTERROUND

whilst it would be tempting to make a double copy of w[] it would be
much more interesting to see what REMAP can do, here.

 //ChaCha quarter-round function
 #define CHACHA_QUARTER_ROUND(a, b, c, d) \
 { \
    a += b; \
    d ^= a; \
    d = ROL32(d, 16); \
    c += d; \
    b ^= c; \
    b = ROL32(b, 12); \
    a += b; \
    d ^= a; \
    d = ROL32(d, 8); \
    c += d; \
    b ^= c; \
    b = ROL32(b, 7); \
 }

so there are interdependencies created between those, but pipelineable.
however they are all in the exact same order, meaning that some REMAP
Index offsets (QTY 32 per operand) could be used (because 2 blocks on
w[])

with elwidth=8 on Indexed REMAP that is QTY4 64bit regs to store 32
8bit Indices. 

*BUT*... 5 are needed (4 is a b c d above) because of the magic ROL32
constants. or wait, no, it is Vertical-First, they can be selected using
scalar.

that's doable.

the entire inner loop crashes down to around... 8 instructions which
fits into a single Cache-Line.  holy cow.  yeah that's worth exploring.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


More information about the libre-soc-bugs mailing list