[Libre-soc-dev] SVP64 parallel map-reduce idea

Jacob Lifshay programmerjake at gmail.com
Fri Jun 11 22:28:45 BST 2021


On Fri, Jun 11, 2021, 11:06 Luke Kenneth Casson Leighton <lkcl at lkcl.net>
wrote:

> after implementing the scalar map-reduce in ISACaller, it occurred to me
> that a fixed (predictable, useful, non-ambiguous, clear) algorithm for the
> Program Order in which the reductions has to take place might be a good
> idea.
>

Here's my proposal:
we say all reductions behave as-if the following tree-reduction algorithm
with handling for predication is executed, even if the reduction operation
is not associative and/or not commutative (e.g. we use the same algorithm
for reduction with subtraction or division or atan2, every instruction that
can encode reduction in the SVP64 prefix):

https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=3e49f5f64a00edcfc47d8114e23c76ed

/// reference implementation of proposed SimpleV reduction semantics.
///
/// `temp_pred` is a non-user-visible register that can be stored in some
/// SPR if the reduction is interrupted, or we can just restart the
reduction
/// from the beginning since it will produce the same results.
///
/// all input arrays have length `vl`
pub fn reduce(
    vl: usize,
    input_vec: &[f32],
    temp_vec: &mut [f32],
    input_pred: &[bool],
    temp_pred: &mut [bool],
) -> f32 {
    assert_eq!(input_vec.len(), vl);
    assert_eq!(temp_vec.len(), vl);
    assert_eq!(input_pred.len(), vl);
    assert_eq!(temp_pred.len(), vl);
    for i in 0..vl {
        temp_pred[i] = input_pred[i];
        if temp_pred[i] {
            temp_vec[i] = input_vec[i];
        }
    }
    let mut step = 1;
    while step < vl {
        step *= 2;
        for i in (0..vl).step_by(step) {
            let other = i + step / 2;
            let other_pred = other < vl && temp_pred[other];
            if temp_pred[i] && other_pred {
                // reduction operation -- we still use this algorithm even
                // if the reduction operation isn't associative or
commutative.
                //
                // `f32` addition is used as the reduction operation
                // for ease of exposition.
                temp_vec[i] += temp_vec[other];
            } else if other_pred {
                temp_vec[i] = temp_vec[other];
            }
            temp_pred[i] |= other_pred;
        }
    }
    if vl != 0 && temp_pred[0] {
        // return the result
        temp_vec[0]
    } else {
        todo!("there weren't any enabled input elements, pick a default?")
    }
}

Jacob

>


More information about the Libre-soc-dev mailing list