My program loops through a vector of strings and runs a program to do some work. Each entry in the vector has its own associated program. The child processes are created using fork() and execv() in the loop. The parent process is waiting until each child process has returned before continuing the loop using waidpid(). The called child processes in my test environment (for now) will each print a message, sleep() and print another message.
The code works perfectly fine as long as all execv() does not return -1 (for example because the file wasn't found).
std::vector<std::string> files{ "foo", "bar", "foobar" };
for (size_t i=0; i<files.size(); i++)
{
pid_t pid_fork = fork();
if (pid_fork == -1)
{
std::cout << "error: could not fork process" << std::endl;
} else if (pid_fork > 1)
{
std::cout << "this is the parent" << std::endl;
int pid_status;
pid_t child_ret = waitpid(pid_fork, &pid_status, 0);
std::cout << "child_ret: " << child_ret << std::endl;
if (child_ret == -1)
{
std::cout << "error waiting for child " << pid_fork << std::endl;
} else
{
if (WIFEXITED(pid_status))
{
std::cout << "child process exit status: " << WEXITSTATUS(pid_status) << std::endl;
if (WIFEXITED(pid_status) == 0)
{
std::cout << "updating db that file has been loaded: " << files[i].first << std::endl;
/* some code to update a DB table */
} else
{
std::cout << "exit status = FAILED" << std::endl;
}
}
}
} else
{
std::cout << "this is the child" << std::endl;
char *args[] = {NULL};
if (execv(("./etl/etl_" + files[i].c_str(), args) == -1)
{
std::cout << "could not load ./etl/etl_" << files[i] << std::endl;
/* DB insert of failed "load" here */
return EXIT_FAILURE;
}
}
}
/* some more code here writing stuff to a database before cleanup and returning from main*/
Output:
this is the parent
this is the child
hello from etl_foo
etl_foo is done
child_ret: 77388
child process exit status: 0
this is the parent
this is the child
hello from etl_bar
etl_bar is done
child_ret: 77389
child process exit status: 0
this is the parent
this is the child
hello from etl_foobar
etl_foobar is done
child_ret: 77390
child process exit status: 0
If, however I cause execv() to return ´´´-1´´´ because I deleted etl_foobar the parent process seems to no longer wait for the child process to return
this is the child
hello from etl_foo
etl_foo is done
child_ret: 77620
child process exit status: 0
this is the parent
this is the child
hello from etl_bar
etl_bar is done
child_ret: 77621
child process exit status: 0
this is the parent
this is the child
could not load ./etl_foobar
-> here the end of the parent code is reached, the DB is updated and the parent returns (?)
-> I expect the program to be done at this stage, however... this happens
child_ret: 77622
terminate called after throwing an instance of 'sql::SQLException'
what(): Lost connection to MySQL server during query
Aborted (core dumped)
It seems the code block after pid_t child_ret = waitpid(pid_fork, &pid_status, 0); is executed, which I don't understand. The parent has already returned, yet part of the parent's code is still executed and fails as the connection object for the db connection has been deleted just before the parent returns.
The desired behavior is that upon discovery that execv() == -1 the child process returns to the waiting parent, which then finishes off the remaining code and returns itself in an orderly manner, the same way it does when there is no error in execv(). Thank you!
Edit: User Sneftel pointed me to the fact that the child process in case of failure actually doesn't return, which I have changed now. The parent process hence is now waiting for all children to return, including those where execv fails.
Nevertheless, I still have the issue that whenever the child returns with EXIT_FAILURE, the following loop performs up until the next DB insert is attempted, where I continue to get the "lost MySQL connection" error + core dump. Not sure what the origin of this is.