I'm trying to implement a fast primality test for Rust's u32 and u64 datatypes. As part of it, I need to compute (n*n)%d where n and d are u32 (or u64, respectively).
While the result can easily fit in the datatype, I'm at a loss for how to compute this. As far as I know there is no processor primitive for this.
For u32 we can fake it -- cast up to u64, so that the product won't overflow, then take the modulus, then cast back down to u32, knowing this won't overflow. However since I don't have a u128 datatype (as far as I know) this trick won't work for u64.
So for u64, the most obvious way I can think of to accomplish this is to somehow compute x*y to get a pair (carry, product) of u64, so we capture the amount of overflow instead of just losing it (or panicking, or whatever).
Is there a way to do this? Or another standard way to solve the problem?
Richard Rast pointed out that Wikipedia version works only with 63-bit integers. I extended the code provided by Boiethios to work with full range of 64-bit unsigned integers.