Welcome toVigges Developer Community-Open, Learning,Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
418 views
in Technique[技术] by (71.8m points)

mpi - data exchange between multiple ranks with MPI_Bsend

I would like to ask a few questions (mostly question 2.) for the code below whose purpose is to send data to an arbitrary number of 'target' ranks then receive other data (of different length) from all of the targets, i.e. exchange data with all targets. I do not know the order of the data send and receive calls on each rank! Each message size is rather small (up to say 1MB at most), number of targets might be up to say 10. The application is high performance computing on a supercomputer.

My specific questions to be sure I fully understand what is going on (could not find that specifically mentioned anywhere online, might seem obvious but I want to be 100% sure it always works):

  1. Could you please confirm that the Bsend and receive call order will never lead to any deadlock: I am assuming that once all the messages to send are in the buffer then MPI can easily catch any target receive call and start sending the associated buffered sent data, no matter the order of the target receive calls. Is that correct?

2. If every rank has two target neighbours (a'left' and a 'right' one), could this code lead to a cascaded waiting on each others 'left' neighbour to exchange data with its own 'left' neighbour (this would lead to terribly poor performance), or will the buffered data to send be sent even if the rank is waiting on a receive call? In other words: once Bsend has returned and buffered the data, is the actual network data exchange performed by another process/thread created by MPI? (stupid question probably, that's the only way I can see it happen after the Bsend returns)

  1. Is there any better way to do the data exchange or does the code seem good in terms of speed? The data copying (i.e. buffering) itself should not lead to a visible overhead in my case.

  2. Would you recommend to use the same code to share the size of the message (one single integer) with each target or is there a faster way? Or is there any way at all to avoid the need to send the message size (which is unknown at the time of the call).

{

void slmpi::exchange(std::vector targetranks, std::vector sendlens, std::vector<int*> sendbuffers, std::vector receivelens, std::vector<int*> receivebuffers)

int numtargets = targetranks.size();

if (numtargets == 0)
    return;

int totbytelen = 0;
for (int i = 0; i < numtargets; i++)
    totbytelen += sendlens[i]*sizeof(int) + MPI_BSEND_OVERHEAD;

std::vector<char> sendbuffer(totbytelen); // a char is one byte long
MPI_Buffer_attach(&sendbuffer[0], totbytelen);
 
for (int i = 0; i < numtargets; i++)
    MPI_Bsend(sendbuffers[i], sendlens[i], MPI_INT, targetranks[i], 0, MPI_COMM_WORLD);
 
for (int i = 0; i < numtargets; i++)
    MPI_Recv(receivebuffers[i], receivelens[i], MPI_INT, targetranks[i], 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
 
MPI_Buffer_detach(&sendbuffer[0], &totbytelen);

}

question from:https://stackoverflow.com/questions/65868520/data-exchange-between-multiple-ranks-with-mpi-bsend

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)
Waitting for answers

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to Vigges Developer Community for programmer and developer-Open, Learning and Share
...