I am running a Python script on a High Computation Cluster. The execution of the Python script via a batch script returns the following error:
hallo.%j.%N.out:
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
Running ...
/var/tmp/slurmd\_spool/job21059454/slurm\_script: line 21: 17994 Segmentation fault (core dumped) python3 /scratch/username/linalgTest.py
Done
A (slightly more than) minimal working example of the Python script is as follows:
linalgTest.py:
import numpy as np
import scipy.sparse
import logging # https://docs.python.org/3/howto/logging.html
from datetime import datetime
logging.basicConfig(
filename = ("linalg test log "
f"{datetime.now().strftime('%Y_%m_%d-%I_%M_%S_%p')}.log"),
filemode = 'a',
format = '%(asctime)s,%(msecs)d %(name)s %(levelname)s %(message)s',
datefmt = '%H:%M:%S',
level = logging.INFO
)
logging.info(">>> Let's go ...")
logging.info(f"NumPy version: {np.version.version}")
logging.info(f"SciPy version: {scipy.version.version}")
logging.info("Load A ...")
df = scipy.sparse.load_npz(
"test_data.npz"
# Source: https://gigamove.rwth-aachen.de/en/download/e8dad61113ba8004915aab5f3c3b98df
).toarray()
logging.info(f"Size of `df`: {df.nbytes}")
logging.info("Create I ...")
i = np.identity(df.shape[0])
logging.info("Calculate ...")
df_out = np.linalg.inv(i-df)
logging.info("Done")
I figured that the script stops working upon executing np.linalg.inv().
The data set is available here. logging.info(f"Size of df: {df.nbytes}") returns 15454226432.
The logged package versions are: root INFO NumPy version: 1.21.6 and root INFO SciPy version: 1.7.3.
The slurm script looks roughly as follows:
linalgTest.sh:
#!/bin/bash
#SBATCH -o hallo.%j.%N.out # Log
#SBATCH -J JOB NAME # Job name
#SBATCH --nodes=1 # Nodes
#SBATCH --ntasks=1 # CPU-Cores
#SBATCH --mem=250000M # MiB resident memory/node
#SBATCH --time=00:180:00 # Runtime
#SBATCH --partition=standard # Node
# Required modules
module load python/3.7.4
python3 linalgTest.py
Does anyone have a suggestion on how to solve the segmentation fault error?
Update:
I tried to implement scipy.sparse.linalg.inv() instead of np.linalg.inv():
import numpy as np
import scipy.sparse
import scipy.sparse.linalg
import logging # https://docs.python.org/3/howto/logging.html
from datetime import datetime
logging.basicConfig(
filename = ("linalg test log "
f"{datetime.now().strftime('%Y_%m_%d-%I_%M_%S_%p')}.log"),
filemode = 'a',
format = '%(asctime)s,%(msecs)d %(name)s %(levelname)s %(message)s',
datefmt = '%H:%M:%S',
level = logging.INFO
)
logging.info(">>> Let's go ...")
logging.info(f"NumPy version: {np.version.version}")
logging.info(f"SciPy version: {scipy.version.version}")
logging.info("Load A ...")
df = scipy.sparse.load_npz(
"test_data.npz"
# Source: https://gigamove.rwth-aachen.de/en/download/e8dad61113ba8004915aab5f3c3b98df
)
logging.info("Create I ...")
i = scipy.sparse.identity(
df.shape[0],
format = 'csc'
)
logging.info("Calculate ...")
df_out = scipy.sparse.linalg.inv(i-df)
logging.info("Done")
The new approach returned a memory error (although I reserved a node with 500GB memory):
