Without omp I have tested the code below up to n = 350000, when I compile it with g++ -fopenmp it dies somwhere between n = 5000 and n = 10000. I’m an openmp noob so any suggestions would be appreciated. Some of the preliminary research I’ve done suggests to me it may have something to do with std::vector allocation issues. I’ve also implemented this with a couple of different ways of setting up a 2-dim table and row vector combo and the behavior seemed the same, worked up to 5000, but code died at test runs of 10000. Of course the problems as is usually the case wherever I’m involved may be more deeply systemic. The entire class definitions and implementations can be found at stats.hpp
and stats.cpp
// std::vector<float> table[n-1];
// std::vector< std::vector<float> > table(n-1);
// std::vector<float> row(8);
float table[n-1][8];
// without omp I have tested this code up to n = 350000
// with omp it dies somewhere between n = 5000 and n = 10000
#pragma omp parallel for private(sample)
for (int i = 2; i <= n; i++) {
sample.randomSample(x, i);
table[i-2][0] = sample.computeMean();
table[i-2][1] = sample.computeMedian();
table[i-2][2] = sample.computeVar();
table[i-2][3] = sample.computeStd();
table[i-2][4] = sample.computeMeanDev();
table[i-2][5] = sample.computeMedianDev();
table[i-2][6] = sample.computeSkewness();
table[i-2][7] = sample.computeMedianSkew();
}
Below are the sample class and the random sample method code called from within the for loop.
// sample derived class
class Sample : public virtual Stats {
public:
void randomSample(const std::vector<float>&, int);
private:
float var() const;
float std() const;
};
// pop vector is invariant, return it m doesn't make sense
void Sample::randomSample(const std::vector<float>& pop, int m)
{
if (m <= 0) { // error trap
n = 0;
return;
}
n = m;
int pop_size = pop.size();
// use an index vector - preserve passed vector
std::vector<int> idx;
for (int i = 0; i < pop_size; i++) idx.push_back(i);
// perform pop_size random swaps
int seed = clock();
if (seed % 2 == 0) ++seed;
srand(seed);
for (int i = 0; i < pop_size; i++) {
int rndidx = (int)(pop_size*(rand()/32767.0F)); // DOS
int tmp = idx[i];
idx[i] = idx[rndidx];
idx[rndidx] = tmp;
}
x.clear(); // clear x in case something lives there
for (int i = 0; i < n; i++) x.push_back(pop[idx[i]]);
}
Thanks, somewhere along the line I’ve transposed leading with a double underscore for trailing with one
The rand functions are not threadsafe.
I’ll try a simpler sample where I just grab the first m population items (my original easy version) and see if that makes a diffrence. Thanks.
All I know is when I don’t use openmp I only see one processor core spike in usage on my conky desktop graph, when I do I see them all spike, and more importanly the wall clock time it takes things to run can decrease by a factor of 4 even on an underpowered dual core machine. Admittedly this is all just with toy numerically oriented C++ or Fortran projects. I know even less about (nothing) about lambdas than openmp, I’ll have to look into it.
I got things to work without segfaults, and with a nice blank output from
diff outputfilewithoutopenmp outputfilewithopenmp
On g++ on Linux anyway, haven’t tried the mingw32 variant on Windows yet. I broke up the part where I pick up the random samples and the parallel processing section like so.
// random samples processing
std::vector<float> ransams[n-1];
for (int i = 2; i <= n; i++) {
std::vector<float> tmp;
random_sample(x, tmp, i);
ransams[i-2] = tmp;
}
std::vector<float> table[n-1];
#pragma omp parallel for private(sample)
for (int i = 2; i <= n; i++) {
int idx = i-2;
sample.load(ransams[idx]);
table[idx].push_back(sample.computeMean());
table[idx].push_back(sample.computeMedian());
table[idx].push_back(sample.computeVar());
table[idx].push_back(sample.computeStd());
table[idx].push_back(sample.computeMeanDev());
table[idx].push_back(sample.computeMedianDev());
table[idx].push_back(sample.computeSkewness());
table[idx].push_back(sample.computeMedianSkew());
}
P. S. on windows to get to work on my largest test runs of n
// random samples processing
std::vector<float>* ransams = new std::vector<float>[n-1];
for (int i = 2; i <= n; i++) {
std::vector<float> tmp;
random_sample(x, tmp, i);
ransams[i-2] = tmp;
}
std::vector<float> table[n-1];
#pragma omp parallel for private(sample)
for (int i = 2; i <= n; i++) {
int idx = i-2;
sample.load(ransams[idx]);
table[idx].push_back(sample.computeMean());
. . .
table[idx].push_back(sample.computeMedianSkew());
}
delete [] ransams;