I am trying to check whether a large integer is a perfect square using gmpy2 in a JIT-decorated (optimized) routine using numba. The example here is for illustrative purposes only (from a theoretical point of view, such equations or elliptic curves can be treated differently/better). My code seems to overflow since it yields solutions that aren't really ones:
import numpy as np
from numba import jit
import gmpy2
from gmpy2 import mpz, xmpz
import time
import sys
@jit('void(uint64)')
def findIntegerSolutionsGmpy2(limit: np.uint64):
for x in np.arange(0, limit+1, dtype=np.uint64):
y = mpz(x**6-4*x**2+4)
if gmpy2.is_square(y):
print([x,gmpy2.sqrt(y),y])
def main() -> int:
limit = 100000000
start = time.time()
findIntegerSolutionsGmpy2(limit)
end = time.time()
print("Time elapsed: {0}".format(end - start))
return 0
if __name__ == '__main__':
sys.exit(main())
Using a limit = 1000000000 the routine finishes within approx. 4 seconds. The limit, which I am handing over to the decorated function, will not exceed an unsigned integer of 64 Bit (which seems not to be an issue here).
I read that big integers do not work in combination with numba's JIT optimization (see for example here).
My Question: Is there any possibility to use large integers in (GPU) optimized code?
Real reason of wrong results is simple, you forgot to convert
xtompz, so statementx ** 6 - 4 * x ** 2 + 4is promoted tonp.uint64type and computed with overflow (becausexin statement isnp.uint64). Fix is trivial, just addx = mpz(x):also in you may notice that I added
forceobj = True, this is to suppress Numba compilation warnings at start.After this fix everything works fine and you don't see wrong results.
If your task is to check if expression gives strict square then I decided to invent and implement another solution for you, code below.
It works as following. You may notice that if a number is square then it is also square modulus any number (taking modulus is
x % Noperation).We can take any number, for example product of some primes,
K = 2 * 2 * 3 * 5 * 7 * 11 * 13 * 17 * 19. Now we can make a simple filter, compute all squares modulo K, mark this squares inside bit vector and then check what numbers modulo K have ones in this filter bit vector.Filter K (product of primes), mentioned above, leaves only 1% of candidates for squares. We can also do a second stage, apply same filter with other primes, e.g.
K2 = 23 * 29 * 31 * 37 * 41. This will filter them even mor by 3%. In total we will have1% * 3% = 0.03%amount remaining of initial candidates.After two filterings only few numbers remain to be checked. They can be easily fast-checked with
gmpy2.is_square().Filtering stage can be easily wrapped into Numba function, as I did below, this function can have extra Numba param
parallel = True, this will tell Numba to automatically run all Numpy operations in parallel on all CPU cores.In code I use
limit = 1 << 30, this signifies limit of allxto be checked, and I useblock = 1 << 26, this signifies how many numbers to check at a time, in parallel Numba function. If you have enough memory you may setblockto be larger to occupy all CPU cores more efficiently. block of size1 << 26approximately uses around 1 GB of memory.After using my idea with filtering and using multi-core CPU my code solves same task as yours hundred times faster.
Try it online!
Output: