Welcome toVigges Developer Community-Open, Learning,Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
373 views
in Technique[技术] by (71.8m points)

c++ - CUDA: Triple nested loop with reduction inside

I need to convert the following code from C++ with OpenMP to C++ with CUDA. As answered in this question: CUDA access matrix stored in RAM and possibility of being implemented . It is possible to write the portion with OpenMP in CUDA. The first problem is that I don’t know what to do with que sums inside the kernel function.

Legacy Code:

  /*   definition of variables   */
    for (int l = 0; l < N_mesh_points_x; l++){
      for (int m = 0; m < N_mesh_points_y; m++){
        for (int p = 0; p < N_mesh_points_z; p++){
        sum_1 = 0;
        sum_2 = 0;
        #pragma omp parallel for schedule(dynamic) private(phir) reduction(+: sum_1,sum_2)
        for (int i = 0; i < N_mesh_points_x; i++){
          for (int j = 0; j < N_mesh_points_y; j++){
            for (int k = 0; k < N_mesh_points_z; k++){
              if (!(i==l) || !(j==m) || !(k==p)){
                   phir = weights_x[i]*weights_y[j]*weights_z[k]*kern_1(i,j,k,l,m,p);
                   sum_1 += phir * matrix1[position(i,j,k)];
                   sum_2 += phir;
                 }
             }
           }
         }
        (*K2)[position(l,m,p)] = sum_1 + (5 - 2*sum_2) * matrix1[position(l,m,p)];
    }
  }
}

I read some articles about reduction, but I don’t have an array, is only a series of sums. Should I create an array to store the value phir and after that, use reduction on that array? There are any implemented function who does that?


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)
等待大神答复

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to Vigges Developer Community for programmer and developer-Open, Learning and Share
...