I am trying to write a cross-platform globbing method in C++. For this, I am trying to use the std::filesystem library to recurse through directories and compare paths against a provided path regex.
The code I attempt to use is:
std::vector<std::string> glob(const std::string& regexPattern) {
std::vector<std::string> matches
fs::path currentPath = fs::current_path();
fs::path dirPath = regexPattern;
if (dirPath.is_relative()) {
dirPath = currentPath / dirPath;
}
size_t pos = 0;
std::string path_upto_wildcard = "/";
std::string str_dirPath = dirPath.string();
while ((pos = str_dirPath.find("/")) != std::string::npos) {
std::string token =str_dirPath.substr(0, pos);
str_dirPath.erase(0, pos + 1);
if (token == "" ) continue;
cleanPattern += token + "/";
path_upto_wildcard += token + "/";
}
dirPath = path_upto_wildcard;
std::regex regEx(regexPattern);
fs::recursive_directory_iterator endIterator;
for (fs::recursive_directory_iterator it(dirPath); it != endIterator; ++it) {
if ( std::regex_match(it->path().string(), regEx)) {
matches.push_back(it->path().string());
}
}
return matches;
}
which runs just fine when I compile and run the executable that uses this method. However, I use this code in an executable that I want to run 100 times concurrently (with different arguments) on the same machine, in the background. When I do this, I end up with some of the processes finishing just fine, but many others that throw the following error:
terminate called after throwing an instance of 'std::filesystem::__cxx11::filesystem_error'
what(): filesystem error: cannot increment recursive directory iterator: No such file or directory
(core dumped)
This happens if I use std::filesystem or boost::filesystem. However, if I write this glob method to use the unix-dependant glob.hpp library, things run as expected, with no problems. The glob.hpp code is:
std::vector<std::string> glob(const std::string& pattern) {
glob_t g;
glob(pattern.c_str(), GLOB_TILDE, nullptr, &g); // one should ensure glob returns 0!
std::vector<std::string> filelist;
filelist.reserve(g.gl_pathc);
for (size_t i = 0; i < g.gl_pathc; ++i) {
filelist.emplace_back(g.gl_pathv[i]);
}
globfree(&g);
return filelist;
}
I am unsure where the instability comes from, and if there is anything I can do about it?
Here is a Minimal working example:
#include <regex>
#include <filesystem>
#include <glob.h>
#include <iostream>
namespace fs = std::filesystem;
std::vector<std::string> glob(const std::string& regexPattern) {
std::vector<std::string> matches;
fs::path currentPath = fs::current_path();
fs::path dirPath = regexPattern;
if (dirPath.is_relative()) {
dirPath = currentPath / dirPath;
}
size_t pos = 0;
std::string path_upto_wildcard = "/";
std::string str_dirPath = dirPath.string();
while ((pos = str_dirPath.find("/")) != std::string::npos) {
std::string token =str_dirPath.substr(0, pos);
str_dirPath.erase(0, pos + 1);
if (token == "" ) continue;
if (token.find("*") != std::string::npos) continue;
path_upto_wildcard += token + "/";
}
dirPath = path_upto_wildcard;
std::regex regEx(regexPattern);
fs::recursive_directory_iterator endIterator;
for (fs::recursive_directory_iterator it(dirPath); it != endIterator; ++it) {
if ( std::regex_match(it->path().string(), regEx)) {
matches.push_back(it->path().string());
}
}
return matches;
}
int main() {
std::vector dirs{"/mypath/a/.*XYZ*/.*",
"/mypath/b/.*XYZ*/.*",
"/mypath/c/.*XYZ*/.*",
};
for (auto& dir : dirs) {
std::vector<std::string> matches = glob(dir);
for (auto& match : matches) {
std::cout << match << std::endl;
}
}
return 0;
}
which can be compiled with
g++ test_globbing.cxx -o test_globbing -std=c++17
and the effect can be tested by creating some text file test_glob.txt containing
./test_globbing
100 times (one per line), then running run_in_bkg.sh file that contains:
#!/bin/bash/
n=0
while read arg; do echo $n && ((n+=1)); eval ' $arg &> job_$(echo $n).log &'; sleep .2; done < $1
The effect is not always there and I can't get it to "always" happen, but every few runs you would see some jobs just failing, as the number of running processes increase.
Unfortunately, the standard does not specify how concurrent filesystem operations work, and in particular it makes file system races undefined behavior.
Fortunately, there is another way to increment a directory iterator:
std::filesystem::recursive_directory_iterator::increment.It takes a
std::error_codeargument and sets it instead of throwing a filesystem exception. So, instead of your for loop, I believe you can write something like this:I cannot reproduce your issue even with your minimal working example (that I tried to translate to Windows as best I could), so I don't guarantee anything. I'm also unsure whether
it.incrementstill incrementsitif it fails, so you might need to copy the iterator like this: