Subsample a large Armadillo matrix or vector

179 Views Asked by At

I've been skimming through the Armadillo documentation and examples, but it seems there is no real efficient way to subsample (or resample) a large vector or matrix, such that if you had N elements originally, you end up with N / k elements. There are a few methods to shuffle and shift but that's about it.

So I'm just looping over all elements sequentially, but surely there has to be a better way besides vectorizing over the available cores?

bool subsample(config& cfg, arma::mat& data, int skippCount)
{
    const auto processor_count = 1; // currently not using threading because 'inplace'

    const size_t cols = data.n_cols;
    const size_t period = skippCount + 1 ;
    size_t newCols = cols / period;
    newCols += (0 == (cols % period)) ? 0 : 1;
       
    const size_t blockSize = 256;
    std::vector<thread> workers;

    for (size_t blockID = 0; blockID < newCols / blockSize; ++blockID) {
        workers.push_back(std::thread([&data, blockID, newCols, period]() { 
            // copy blockSize elements inplace (overwrites other entries))
            size_t c = blockID * blockSize;
            for (size_t b = 0; (c < newCols) && (b < blockSize); c++, b++) {
                arma::vec v = data.col(period * c); 
                data.col(c) = v;
            }
        }));

        if (workers.size()==processor_count) {
            for (auto& thread : workers) thread.join();
            workers.clear();
        }
    }
    for (auto& thread : workers) thread.join(); // make sure all threads finish
    data.resize(data.n_rows, newCols);
    return true;
}

If you have any suggestions to improve on this, it would be greatly appreciated. Also it would be nice to do this 'inplace' to save on memory.

0

There are 0 best solutions below