I'm trying to implement multiplication using the Karatsuba algorithm.
The usual bit masking and shifting approach implemented in mul_kar2() works.
In mul_kar2_au(), I'm struggling with combining the individual components into a "single number" using a union, but can be an array. When I combine them, like in math school, and cast on 64-bit integer - it works (line commented with //). When I 'put them together' it doesn't. I know I have to emulate mathematical operation, but don't know how.
I don't want to use 64-bit integer as if it was on 32-bit processor/microcontroller, and also I want to extend it into 128-bit multiplication which is not present in arm64, not in hardware, so I have nowhere to cast it on.
typedef union
{
uint32_t au32[2];
uint16_t au16[4];
} au64_u;
au64_u mul_kar2_au(au64_u aru64)
{
uint32_t ha, hb, la, lb, z2, z1, z0;
int m = sizeof(uint32_t)*CHAR_BIT, m_2 = m/2;
au64_u res;
uint32_t a = aru64.au32[1];
uint32_t b = aru64.au32[0];
ha = a >> m_2;
hb = b >> m_2;
la = (uint16_t)a;
lb = (uint16_t)b;
z2 = ha * hb;
z0 = la * lb;
z1 = (ha + la) * (hb + lb) - z2 - z0;
//uint64_t res = ((uint64_t)z2 << (m2 * 2)) + ((uint64_t)z1 << m2) + (uint64_t)z0;
aru64.au16[0] = z0;
aru64.au16[1] = (uint16_t)z1;
aru64.au16[2] = z1 >> m_2;
aru64.au16[3] = z2;
return aru64;
}
And this is a result:
m=32, m_2=16
a=1458354473 (0x56ecb929)
b=1458354473 (0x56ecb929)
y=a*b:
2126797768919107729 0x1d83e74d75844891
t=mul_kar2(a, b):
2126797768919107729 0x1d83e74d75844891
t=mul_kar2_au(a, b):
7606718021055826065 0x69907dbcef984891
As can be seen only the last (least significant) 16 bits are correct.
The whole program can be found here.
If it's math question I will put it in correct department.
In math school, you add properly aligned parts.
Using A,a and B,b for single character factor part designators:
z0:
abab ababz1:
(A+a)(B+b)z2:
ABAB ABABsumming:
ABAB ABAB abab abab+ (A+a)(B+b)ignores the upper half of z0 and the lower one of z2.
For one way to avoid problems with the size of
z1see the bottom of en.wikipedia's paragraph on the Implementation of Karatsuba's multiplication algorithm.