C++ & OpenMP

Necron_99 · November 30, 2017, 1:49am

Without omp I have tested the code below up to n = 350000, when I compile it with g++ -fopenmp it dies somwhere between n = 5000 and n = 10000. I’m an openmp noob so any suggestions would be appreciated. Some of the preliminary research I’ve done suggests to me it may have something to do with std::vector allocation issues. I’ve also implemented this with a couple of different ways of setting up a 2-dim table and row vector combo and the behavior seemed the same, worked up to 5000, but code died at test runs of 10000. Of course the problems as is usually the case wherever I’m involved may be more deeply systemic. The entire class definitions and implementations can be found at
stats.hpp
and
stats.cpp

   // std::vector<float> table[n-1];
   // std::vector< std::vector<float> > table(n-1);
   // std::vector<float> row(8);
   float table[n-1][8];
   // without omp I have tested this code up to n = 350000
   // with omp it dies somewhere between n = 5000 and n = 10000
   #pragma omp parallel for private(sample)
   for (int i = 2; i <= n; i++) {
      sample.randomSample(x, i);
      table[i-2][0] = sample.computeMean();
      table[i-2][1] = sample.computeMedian();
      table[i-2][2] = sample.computeVar();
      table[i-2][3] = sample.computeStd();
      table[i-2][4] = sample.computeMeanDev();
      table[i-2][5] = sample.computeMedianDev();
      table[i-2][6] = sample.computeSkewness();
      table[i-2][7] = sample.computeMedianSkew();
   }

Below are the sample class and the random sample method code called from within the for loop.

// sample derived class
class Sample : public virtual Stats {
   public:
      void randomSample(const std::vector<float>&, int);
   private:
      float var() const;
      float std() const;
};

// pop vector is invariant, return it m doesn't make sense
void Sample::randomSample(const std::vector<float>& pop, int m)
{
   if (m <= 0) {	// error trap
      n = 0;
      return;
   }
   n = m;
   int pop_size = pop.size();
   // use an index vector - preserve passed vector
   std::vector<int> idx;
   for (int i = 0; i < pop_size; i++) idx.push_back(i);
   // perform pop_size random swaps
   int seed = clock();
   if (seed % 2 == 0) ++seed;
   srand(seed);
   for (int i = 0; i < pop_size; i++) {
      int rndidx = (int)(pop_size*(rand()/32767.0F)); // DOS
      int tmp = idx[i];
      idx[i] = idx[rndidx];
      idx[rndidx] = tmp;
   }
   x.clear();  // clear x in case something lives there
   for (int i = 0; i < n; i++) x.push_back(pop[idx[i]]);
}

risk · November 30, 2017, 10:10am

Never used openmp, but does it take care of putting a mutex around x?

Is it even worth using openmp with modern c++ with lambdas and what not in stdlib?

pFtpr · November 30, 2017, 10:17am

Though unlikely to be the culprits a few things stand out to me

The rand functions are not threadsafe.

Don’t do this. Appending to a vector in a loop has quadratic time complexity!

#ifndef _STATS_HPP__
#define _STATS_HPP__

Symbols starting with _ are reserved for use by the compiler.

catsay · November 30, 2017, 10:22am

I’ll leave this here. Hope it helps.

Necron_99 · November 30, 2017, 2:48pm

#ifndef _STATS_HPP__
#define _STATS_HPP__

Thanks, somewhere along the line I’ve transposed leading with a double underscore for trailing with one

The rand functions are not threadsafe.

I’ll try a simpler sample where I just grab the first m population items (my original easy version) and see if that makes a diffrence. Thanks.

All I know is when I don’t use openmp I only see one processor core spike in usage on my conky desktop graph, when I do I see them all spike, and more importanly the wall clock time it takes things to run can decrease by a factor of 4 even on an underpowered dual core machine. Admittedly this is all just with toy numerically oriented C++ or Fortran projects. I know even less about (nothing) about lambdas than openmp, I’ll have to look into it.

Thanks all.

Necron_99 · December 2, 2017, 1:43am

I got things to work without segfaults, and with a nice blank output from
diff outputfilewithoutopenmp outputfilewithopenmp
On g++ on Linux anyway, haven’t tried the mingw32 variant on Windows yet. I broke up the part where I pick up the random samples and the parallel processing section like so.

   // random samples processing
   std::vector<float> ransams[n-1];
   for (int i = 2; i <= n; i++) {
      std::vector<float> tmp;
      random_sample(x, tmp, i);
      ransams[i-2] = tmp;
   }
   std::vector<float> table[n-1];
   #pragma omp parallel for private(sample)
   for (int i = 2; i <= n; i++) {
      int idx = i-2;
      sample.load(ransams[idx]);
      table[idx].push_back(sample.computeMean());
      table[idx].push_back(sample.computeMedian());
      table[idx].push_back(sample.computeVar());
      table[idx].push_back(sample.computeStd());
      table[idx].push_back(sample.computeMeanDev());
      table[idx].push_back(sample.computeMedianDev());
      table[idx].push_back(sample.computeSkewness());
      table[idx].push_back(sample.computeMedianSkew());
   }

P. S. on windows to get to work on my largest test runs of n

   // random samples processing
   std::vector<float>* ransams = new std::vector<float>[n-1];
   for (int i = 2; i <= n; i++) {
      std::vector<float> tmp;
      random_sample(x, tmp, i);
      ransams[i-2] = tmp;
   }
   std::vector<float> table[n-1];
   #pragma omp parallel for private(sample)
   for (int i = 2; i <= n; i++) {
      int idx = i-2;
      sample.load(ransams[idx]);
      table[idx].push_back(sample.computeMean());
      . . .
      table[idx].push_back(sample.computeMedianSkew());
   }
   delete [] ransams;