I've been trying to run some parallelised code in C++ from Python through cppyy but am facing an error.
The executable (compilted through GCC with -fopenmp -O2) runs without errors and shows the expected drop in runtime from parallelisation.
When the #pragma omp parallel for is commented out of the C++ code, cppyy doesn't raise any errors. However, when the pragma is part of the code I get the error below:
IncrementalExecutor::executeFunction: symbol '__kmpc_for_static_fini' unresolved while linking symbol '__cf_4'!
IncrementalExecutor::executeFunction: symbol '__kmpc_for_static_init_4' unresolved while linking symbol '__cf_4'!
IncrementalExecutor::executeFunction: symbol '__kmpc_fork_call' unresolved while linking symbol '__cf_4'!
IncrementalExecutor::executeFunction: symbol '__kmpc_global_thread_num' unresolved while linking symbol '__cf_4'!
Traceback (most recent call last):
File "...../SO_troubleshooting/example_pll_cppyy_code.py", line 8, in <module>
output = cppyy.gbl.pll_somelinalgeb()
ValueError: std::vector<std::vector<Eigen::Matrix<double,-1,-1,0,-1,-1> > > ::pll_somelinalgeb() =>
ValueError: nullptr result where temporary expected
Here is the short Python script:
import cppyy
cppyy.add_include_path('../np_vs_eigen/eigen/')
cppyy.include('easy_example.cpp')
vector = cppyy.gbl.std.vector
import datetime as dt
print('Starting the function call now ')
start = dt.datetime.now()
output = cppyy.gbl.pll_somelinalgeb()
stop = dt.datetime.now()
print((stop-start), 'seconds')
The C++ toy code is below. It generates a random matrix with Eigen, calculates its pseudo-inverse, and then sleeps for 1 ms.
#include <omp.h>
#include <iostream>
#include <Eigen/Dense>
#include <chrono>
#include <vector>
#include <thread>
using Eigen::VectorXd;
using Eigen::MatrixXd;
std::vector<MatrixXd> some_linearalgebra(){
std::vector<MatrixXd> solutions;
std::srand((unsigned int) time(0));//ensures a new random matrix each time
MatrixXd arraygeom(5,3);
arraygeom = MatrixXd::Random(5,3);
VectorXd row1 = arraygeom.block(0,0,1,3).transpose();
arraygeom.rowwise() -= row1.transpose();
MatrixXd pinv_arraygeom(3,5);
// calculate the pseudoinverse of arraygeom
pinv_arraygeom = arraygeom.completeOrthogonalDecomposition().pseudoInverse();
//std::cout << pinv_arraygeom << std::endl;
solutions.push_back(pinv_arraygeom);
solutions.push_back(pinv_arraygeom);
std::this_thread::sleep_for(std::chrono::milliseconds(1));
return solutions;
}
std::vector<std::vector<MatrixXd>> pll_somelinalgeb(){
int num_runs = 5000;
std::vector<std::vector<MatrixXd>> all_solns(num_runs);
#pragma omp parallel for
for (int i=0; i<num_runs; i++){
all_solns[i] = some_linearalgebra();
}
return all_solns;
}
int main(){
std::vector<MatrixXd> main_out;
main_out = some_linearalgebra();
auto start = std::chrono::system_clock::now();
std::vector<std::vector<MatrixXd>> main2_out;
main2_out = pll_somelinalgeb();
auto end = std::chrono::system_clock::now();
std::cout << std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count() << " ms" << std::endl;
return 0;
}
System + OS specs:
- Ubuntu 18.04.2 LTS,Intel® Core™ i7-10700 CPU @ 2.90GHz × 16 , 64Bit
- Python 3.9.0
- cppyy 2.4.0 (pip install)
- Eigen 3.4.0
Non-default precompiled cppyy header used:
As per this link I ran the following commands on terminal
export EXTRA_CLING_ARGS='-fopenmp' and then ran the code with cppyy_backend.loader, and then finally added the CLING_STANDARD_PCH environment variable with another export.
C++ Executable compiled with g++-11 easy_example.cpp -fopenmp -O2 -I <path_to_Eigen_library here>
The problem was linking the OpenMP library file to the Cling compiler that runs
cppyy.I encountered the same problem in Linux Mint and Windows 11 too - which made me realise it is not an OS-specific problem. What ended up working was the following:
-fopenmpflag to yourEXTRA_CLING_ARGSenvironmental variable (export EXTRA_CLING_ARGS='-fopenmp'in Unix, and in Windows go through the start menu and add a new environmental variable). Compile a new precompiled header with the code in the docs here, and define the path to the precompiled header with$ locate libiomp5in Unix, or search for libiomp5 in the File Explorer in Windows). The Unix file should belibiomp5.soand the Windows version islibiomp5md.dllcppyy.load_library(<insert path to .so or .dll file here>)You should now have parallelised code!
This gets you to working code, but is admittedly a rather manual approach - would be happy to hear a more automated approach.