MPI is implemented as a library of functions, a C, C++ or Fortran programmer doesn't have to learn a new language to use MPI; if a sequential program already exists, much of that program can be used for an MPI version. An MPI program complies just like a regular program. At link time, the MPI library must be accessed.
On clusters, each node has a separate memory space, i.e. distributed memory, as a result, there is no way to get at another's except via message passing over the network. Messages may be used for:
- Sending data
- Performing operations on data
- Synchronization between tasks
For efficiency, MPI is usually run on clusters of many computers of the same type, with high-speed connection. MPI could also be run (with less reliable speedup) on collections of several types of computers with relative slow connections. Due to distributed memory between processors, the data for the problem has been divided up among these processors. This means that, from time to time, one processor may need to "ask" another processor for a copy of certain data or result.
Another attractive approach to parallel programming is called OpenMP. This system is used when a shared memorymachine is available, this is when several processors share a single physical memory, or a number of processors can address the same "logical" memory. An existing sequential program can be turned into an OpenMP program simply by inserting special comment statements.
MPI has a wide range of capabilities. There are three subsets of functionality:
- Basic (about 6 functions)
- Advanced (up to 125 functions)
All processes must initialize and finalize MPI
- MPI_Init(): Starts up the MPI runtime environment
- MPI_Finalize(): Shuts down the MPI runtime environment
MPI uses MPI_Comm objects to define subsets of processors (i.e. communicators) that may communicate with one another. By default MPI_COMM_WORLD includes all your processors.
Some examples of simple MPI programs are provided in the Examples section.
Single Program, Multiple Data (SPMD) means every process gets a copy of the executable. Each process looks at its own rank (unique ID within a communicator) to determine which part of the problem to work on. Except when communicating, each process works completely independently.
Point-to-Point (P2P) Communication sends data from one point to another, one task sends while another receives.
- Basic P2P involves two functions, MPI_Send() and MPI_MPI_Recv(). An alternative is MPI_SendRecv with different tags to indicate send and receive stages. It's especially useful when each node both sends and receives messages (two-way communication).
- Synchronous Communication: Messages are sent and received via MPI_Ssend() and MPI_Srecv(). Handshaking occurs between send and receive tasks to confirm a safe send. As a result, in some case, it blocks send/receive.
- Buffered Communication: Messages are sent and received via MPI_Bsend() and MPI_Brecv(). The content of the message is copied into a system-controlled block of memory (system buffer). The sender continues executing other tasks; when the receiver is ready to receive, the system simply copies the buffered message into the appropriate memory location.
MPI communications can be blocking or non-blocking:
- A Blocking (MPI_Send() and MPI_MPI_Recv()) send routing will only return after it is safe to modify the buffer. "Safe" in this case means that modification will not affect the data to be sent. Safe does not imply that the data was actually received.
- Non-blocking (MPI_Isend() and MPI_Irecv()) send/receive routines return immediately. Non-blocking operations request that the MPI library perform the operation "when possible". It is unsafe to modify the buffer until the requested operation has been performed. Non-blocking communications are primarily used to overlap computation with communication, optimizing performance.
Improper use of blocking receive/send will result in deadlock, where two processors cannot progress because each of them is waiting on the other. (This can also happen for many processors.) For example, the following code contains a deadlock:
IF (rank==0) THEN CALL MPI_RECV(recvbuf,count,MPI_REAL,1,tag,MPI_COMM_WORLD,status,ierr) CALL MPI_SEND(sendbuf,count,MPI_REAL,1,tag,MPI_COMM_WORLD,ierr) ELSEIF (rank==1) THEN CALL MPI_RECV(recvbuf,count,MPI_REAL,0,tag,MPI_COMM_WORLD,status,ierr) CALL MPI_SEND(sendbuf,count,MPI_REAL,0,tag,MPI_COMM_WORLD,ierr) ENDIF
This can be solved with the following minor modification:
IF (rank==0) THEN CALL MPI_SEND(sendbuf,count,MPI_REAL,1,tag,MPI_COMM_WORLD,ierr) CALL MPI_RECV(recvbuf,count,MPI_REAL,1,tag,MPI_COMM_WORLD,status,ierr) ELSEIF (rank==1) THEN CALL MPI_RECV(recvbuf,count,MPI_REAL,0,tag,MPI_COMM_WORLD,status,ierr) CALL MPI_SEND(sendbuf,count,MPI_REAL,0,tag,MPI_COMM_WORLD,ierr) ENDIF
As the name suggests, collective communication is defined as communication between more than two (usually many) processors. They come in a few forms:
A collective operation requires that all processes within the communicator group call the same collective communication function with matching arguments. A few points to note:
- Collective communicates are blocking operations.
- Collective operations on subsets of processes require separate grouping/new communicators.
- The size of data sent must exactly match the size of the data received.
Some of the most important functions in collective communication are listed below. The actions of these functions are summarized in Figure 1.
- MPI_BCAST(): Broadcast a message from the root process
- MPI_SCATTER(): Each process receive a segment from the root
- MPI_GATHER(): Each process sends contents to the root (opposite of MPI_SCATTER())
- MPI_ALLGATHER(): An MPI_GATHER() whose result ends up on all processors
- MPI_REDUCE(): Applies a reduction operation on all tasks and places the result in the receive buffer on the root process
- MPI_ALLREDUCE(): Applies a reduction and place the result in all tasks in the group (MPI_REDUCE()+MPI_BCAST())
The following section gives an example of compiling MPI programs using MPICH:
- C: mpicc -o foo foo.c
- Fortran: mpif90 -o foo foo.f
And the command for running MPI programs using MPICH is: mpirun –np 2 <progname>
These two simple examples do no more than display the text "Hello World!" from multiple processors:
These examples calculate an integral quadrature: