--- Comment #14 from Jacob Lifshay <programmerjake at gmail.com> ---
https://lists.libre-soc.org/pipermail/libre-soc-dev/2022-April/004772.html
> a new 128/64 scalar divide instruction would be
>
>    dividend[0:(XLEN*2)-1] <- (RA) || (RS)
>
> where RS=RA+1 just like lq and stq.

divrem2du rt, ra, rb, rc
v = (ra << 64) | rb;
d = rc; // copy first in case rt is rc
rt = UDIV(v, rc);
ra = UREM(v, rc);

because that's needed for bigint by word division:
uint64_t n[n_size], d; // n has most-significant word in n[n_size - 1]
uint64_t carry = 0;
for(i = n_size; i > 0; i--)
{
uint128_t v = ((uint128_t)carry << 64) | n[i - 1];
carry = v % d;
n[i - 1] = v / d;
}
// n is the quotient, carry is the remainder

li carry, 0
set_remap_reverse
sv.divrem2du n.v, carry.s, n.v, d.s
// n is the quotient, carry is the remainder

we'd probably also want a divrem2d for signed div/rem since that's also needed
for faster 128-bit arithmetic (which is likely way more common than bigints)

