MPI on the GRID: the batching environment - typical scripts
From EGEE-see WIki
The page is part of SEE-GRID Gridification Guide
This topic is contributed by Center for Scientific Research of SASA and University of Kragujevac,Serbia
Batching systems
Every large system including large Linux clusters, as well as GRID environments comes with its specific issues, but one of the common places is the batch job execution. The term denotes the job execution without any user interaction (i.e. over scripting). The benefits are the following:
- It allows sharing of computer resources among many users
- It shifts the time of job processing to when the computing resources are less busy
- It avoids idling the computing resources without minute-by-minute human interaction and supervision
- It is used on expensive classes of computers to help amortize the cost by keeping high rates of utilization of those expensive resources.
SEE-GRID environment issues
The above statements stand not only for multi-processor MPI jobs, these also stand for single-processing jobs. But besides, there is a specific issue regarding MPI jobs on PBS-TORQUE batch system commonly used in gLite middleware. This issue is the usage of the script mpiexec instead of the original script mpirun supplied in MPICH package. One can also use the standard mpirun way, but mpiexec is recommended because of the following benefits:
- Starting tasks with the torque interface is much faster than invoking a separate rsh * once for each process.
- Resources used by the spawned processes are accounted correctly with mpiexec, and reported in the PBS logs, because all the processes of a parallel job remain under the control of PBS, unlike when using mpirun-like scripts.
- Tasks which exceed their assigned limits of CPU time, wall clock time, memory usage, or disk space are killed cleanly by PBS. It is quite hard for processes to escape control of the resource manager when using mpiexec.
- You can use mpiexec to enforce a security policy. If all jobs are forced to spawn using mpiexec and the PBS execution environment, it is not necessary to enable rsh or ssh access to the compute nodes in the cluster.
The mpiexec (or mpirun) call is never invoked directly, but always through the script. The typical script whose purpose is to execute the integration example on a number of processors described in JDL (Job Description Language) file would look like the following:
#!/bin/sh -x
# the binary to execute
EXE=$1
echo "***********************************************************************"
echo "Running on: $HOSTNAME"
echo "As: " `whoami`
echo "***********************************************************************"
echo "***********************************************************************"
echo "Compiling binary: $EXE"
echo mpicc -o ${EXE} ${EXE}.c
mpicc -o ${EXE} ${EXE}.c
echo "*************************************"
if [ "x$PBS_NODEFILE" != "x" ] ; then
echo "PBS Nodefile: $PBS_NODEFILE"
HOST_NODEFILE=$PBS_NODEFILE
fi
if [ "x$HOST_NODEFILE" = "x" ]; then
echo "No hosts file defined. Exiting..."
exit
fi
echo "***********************************************************************"
CPU_NEEDED=`cat $HOST_NODEFILE | wc -l`
echo "Node count: $CPU_NEEDED"
echo "Nodes in $HOST_NODEFILE: "
cat $HOST_NODEFILE
echo "***********************************************************************"
echo "***********************************************************************"
CPU_NEEDED=`cat $HOST_NODEFILE | wc -l`
echo "Checking ssh for each node:"
NODES=`cat $HOST_NODEFILE`
for host in ${NODES}
do
echo "Checking $host..."
ssh $host hostname
done
echo "***********************************************************************"
echo "***********************************************************************"
echo "Executing $EXE with mpiexec"
chmod 755 $EXE
mpiexec `pwd`/$EXE > mpiexec.out 2>&1
echo "***********************************************************************"
The JDL file must contain the keyword MPICH in JobType attribute followed by the number of required logical processors. The argument to the previous script is the name of the C file (without the extension) to be compiled and executed. It is also possible to supply the executable instead of source files, but then the user must assure that it will run well on SL3 system, especially regarding the necessary libraries. The common way is to build executable is to do it on UI machine prior to job submission. This is the typical JDL file for MPICH job:
Type = "Job";
JobType = "MPICH";
NodeNumber = 4;
Executable = "MPItest.sh";
Arguments = "example1";
StdOutput = "test.out";
StdError = "test.err";
InputSandbox = {"MPItest.sh","example1.c"};
OutputSandbox = {"test.err","test.out","mpiexec.out"};
Requirements = Member("MPICH",other.GlueHostApplicationSoftwareRunTimeEnvironment) && (other.GlueCEInfoTotalCPUs>=4);
