MPI - Message Passing Interface - is a library specification the foundation of which is a group of functions that can be used in either Fortran or C, to achieve programming parallelism. The various functions permit processes on multiple compute nodes to talk to another by the exchange of messages.
The MPICH2 implementation is from Argonne National Laboratory and meets the MPI-2 standard. The mpiexec script, developed at the Ohio Supercomputing Center, allows you to run MPI programs without complex configurations and processing methods, resulting in a marked improvement over the previous method.
Make two changes: one to the PATH variable; the other, to the LD_LIBRARY_PATH. The former is mandatory whereas the latter is optional depending on the compiler you use. If your login shell is C, then the PATH change is made to the in the .cshrc file. If, on the other hand, you use bash, then the PATH variable in the .bash_profile file is altered. To see which shell is your default, enter:
If the output is /bin/bash, you are using the bash shell; if /bin/csh, then the C shell.
For a bash shell user, include the following to the .bash_profile file in your home directory:
export MPICH2_HOME=/usr/common/mpich2-1.0.7 export PATH=$MPICH2_HOME/bin:$PATH export LD_LIBRARY_PATH=/usr/common/mpich2-1.0.7/lib:$LD_LIBRARY_PATH
For a C shell user, include the following to your .cshrc in your home directory:
setenv MPICH2_HOME /usr/common/mpich2-1.0.7 setenv PATH /usr/common/mpich2-1.0.7/bin:$PATH setenv LD_LIBRARY_PATH /usr/common/mpich2-1.0.7/lib:$LD_LIBRARY_PATH
Note: If you do not use the GNU compilers (instead, you use the supported Portland Group compilers), then the change to the LD_LIBRARY_PATH is unncessary because you will specify the appropriate library path with the -L option in the compile statement.
Be sure to use mpiexec to run your programs
The following scripts are available to compile and link your mpi programs:
|mpif77||GNU Fortran 77|
|mpif90||Portland Group Fortran 90|
Each script will invoke the appropriate compiler.
Executing an mpich2 job is now relatively straight forward.
mpiexec should not be run from the command line; instead put it into a script file. Below is an example script file, used to run a simple GNU Fortran 77 program.
#PBS -l walltime=5:00 #PBS -l cput=5:00 #PBS -l nodes=4 #PBS -q dedicated cd ~NetID/a_directory_here mpiexec mpi-program-name.out > output_from_program_here 2>&1
This script does a number of things:
Note that the program is run using mpiexec which automatically selects the execution nodes based on current resources available in the selected or defaulted queue!
Do not specify the nodes by name; identify the number of nodes and processors:
WRONG: #PBS -l nodes=argo1-1+argo1-2+argo1-3+argo1-4 RIGHT: #PBS -l nodes=4
The script is run from the command line, invoked by the command: qsub script_name
The IMPORTANT thing to remember is that the number of nodes available is based on the queue you selected for execution.
Once your script has completed - you can check job progress with the qstat command - you can look at the output file(s).
The example script uses only one processor per compute node. What follows is a graphical representation of how that job looks, assuming the automatic scheduler decided to run the job on the four compute nodes argo1-1 through argo1-4.
However, argo is not a homogenous environment. Some nodes have one CPU while others have two CPUs.
There is no restriction that each node in the script use only a single processor. By changing the third line in the above script to read
#PBS -l nodes=4:ppn=2
Each node will use two processors. Once again the resource manager and the scheduler in combination will decide the optimal choice of nodes.
For example purposes only, the processes and the nodes were ordered:
The ordering is not guaranteed; which process goes to which node is not something you can dictate. It doesn't follow that the first process MUST go to the first CPU on the first node; the second process to the second CPU on the first node, and so on. The processes could just has easily been:
If you receive an error which reads "Lamnodes Failed! Check if you had booted lam before calling mpiexec else use -machinefile to pass host file to mpiexec" you have not properly set the path as shown in the Configuring your environment section above.
September 20, 2016