Segmentation fault during matrix inversion

165 Views Asked by At

I am running a Python script on a High Computation Cluster. The execution of the Python script via a batch script returns the following error:

hallo.%j.%N.out:

% Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
Dload  Upload   Total   Spent    Left  Speed

0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
Running ...
/var/tmp/slurmd\_spool/job21059454/slurm\_script: line 21: 17994 Segmentation fault      (core dumped) python3 /scratch/username/linalgTest.py
Done

A (slightly more than) minimal working example of the Python script is as follows:

linalgTest.py:

import numpy as np
import scipy.sparse

import logging # https://docs.python.org/3/howto/logging.html

from datetime import datetime

logging.basicConfig(
    filename = ("linalg test log "
                f"{datetime.now().strftime('%Y_%m_%d-%I_%M_%S_%p')}.log"),
    filemode = 'a',
    format = '%(asctime)s,%(msecs)d %(name)s %(levelname)s %(message)s',
    datefmt = '%H:%M:%S',
    level = logging.INFO
    )

logging.info(">>> Let's go ...")
logging.info(f"NumPy version: {np.version.version}")
logging.info(f"SciPy version: {scipy.version.version}")

logging.info("Load A ...")

df = scipy.sparse.load_npz(
    "test_data.npz" 
    # Source: https://gigamove.rwth-aachen.de/en/download/e8dad61113ba8004915aab5f3c3b98df
    ).toarray()

logging.info(f"Size of `df`: {df.nbytes}")

logging.info("Create I ...")

i = np.identity(df.shape[0])

logging.info("Calculate ...")

df_out = np.linalg.inv(i-df)

logging.info("Done")

I figured that the script stops working upon executing np.linalg.inv().

The data set is available here. logging.info(f"Size of df: {df.nbytes}") returns 15454226432.

The logged package versions are: root INFO NumPy version: 1.21.6 and root INFO SciPy version: 1.7.3.

The slurm script looks roughly as follows:

linalgTest.sh:

#!/bin/bash

#SBATCH -o hallo.%j.%N.out                          # Log
#SBATCH -J JOB NAME                                 # Job name
#SBATCH --nodes=1                                   # Nodes     
#SBATCH --ntasks=1                                  # CPU-Cores
#SBATCH --mem=250000M                               # MiB resident memory/node
#SBATCH --time=00:180:00                            # Runtime 
#SBATCH --partition=standard                        # Node

# Required modules
module load python/3.7.4
python3 linalgTest.py

Does anyone have a suggestion on how to solve the segmentation fault error?

Update:

I tried to implement scipy.sparse.linalg.inv() instead of np.linalg.inv():

import numpy as np
import scipy.sparse
import scipy.sparse.linalg

import logging # https://docs.python.org/3/howto/logging.html

from datetime import datetime

logging.basicConfig(
    filename = ("linalg test log "
                f"{datetime.now().strftime('%Y_%m_%d-%I_%M_%S_%p')}.log"),
    filemode = 'a',
    format = '%(asctime)s,%(msecs)d %(name)s %(levelname)s %(message)s',
    datefmt = '%H:%M:%S',
    level = logging.INFO
    )

logging.info(">>> Let's go ...")
logging.info(f"NumPy version: {np.version.version}")
logging.info(f"SciPy version: {scipy.version.version}")

logging.info("Load A ...")

df = scipy.sparse.load_npz(
    "test_data.npz" 
    # Source: https://gigamove.rwth-aachen.de/en/download/e8dad61113ba8004915aab5f3c3b98df
    )

logging.info("Create I ...")

i = scipy.sparse.identity(
    df.shape[0],
    format = 'csc'
    )

logging.info("Calculate ...")

df_out = scipy.sparse.linalg.inv(i-df)

logging.info("Done")

The new approach returned a memory error (although I reserved a node with 500GB memory):

enter image description here

0

There are 0 best solutions below