Template Based Submission and Benchmarking of MPI Jobs, Varying Jobs and Job Sets
From EGEE-see WIki
The page is part of SEE-GRID Gridification Guide
This topic is contributed by Center for Scientific Research of SASA and University of Kragujevac,Serbia
Contents |
Introduction
Here you can find an example on how to measure the performance of any application which can run on grid. The procedure shown here, as well as the scripts written for this purpose were used to measure the performance and scalability ratio of PBFS (Parallel Blood Flow Simulation) developed at Center for Scientific Research of SASA and University of Kragujevac, Serbia. Results showing mentioned quantities can be obtained from SEE-GRID-2 PBFS Progress Reports: 1, 2.
Here is the subsequence of steps performed in order to fulfill the objective, which is to measure the execution time of the probe job (the starting point is not the submission time, but job executable invocation time) on various CEs with various number of CPUs. The methodology presented is capable of performing various benchmarking tasks on the grid, that is the reason why the scripts have been written to be as generic as possible. Besides benchmarking application, the procedure can also be applied whenever the submission and control of series of somehow related jobs is needed.
Prerequisites
Template JDL file
In order to run scripts written for this purpose, one must specify the template JDL file, which is going to be used in order to prepare real JDL files for job submissions. This is an ordinary JDL used to submit the job, but with additional variable located in place of JDL parameter we wish to vary (or any text string), in our specific example: "NodeNumber = nn" has been replaced with "NodeNumber = _NODENUMBER_". The first submit_probe_jobs.sh argument (mandatory) denotes the name of mentioned variable parameter, while the user can either supply JDL template file name as the second argument or use the default “job_jdl_template” file name. The example made for PBFS performance test is given:
Type = "Job";
InputSandbox = {…};
StdOutput = "job.out";
StdError = "job.err";
Executable = "…";
Arguments = "…";
OutputSandbox = {…,"job.err"};
#for MPI jobs
JobType = "MPICH";
NodeNumber = _NODENUMBER_;
Requirements = Member("MPICH",other.GlueHostApplicationSoftwareRunTimeEnvironment);
Another example of usage would be a variable input file name in Arguments and/or InputSandbox tag. On the other hand, if you want to run identical jobs on several CEs, just provide a string that is not present in the template as the first argument of submit_probe_jobs.sh.
File containing the combinations of CE and varying parameter
The submit script also requires the input containing table with combinations of CE URL and specific value of varying parameter, for example (file CEs.txt):
cluster1.csk.kg.ac.yu:2119/jobmanager-pbs-seegrid 8 cluster1.csk.kg.ac.yu:2119/jobmanager-pbs-seegrid 4 cluster1.csk.kg.ac.yu:2119/jobmanager-pbs-seegrid 2 ce02.grid.acad.bg:2119/jobmanager-pbs-seegrid 8 ce02.grid.acad.bg:2119/jobmanager-pbs-seegrid 4 ce02.grid.acad.bg:2119/jobmanager-pbs-seegrid 2
The approach described above can be applied in different benchmarking and practical tasks depending on what CEs.txt includes:
- multiple runs on single CE with variable parameter in the purpose of obtaining how the specific parameter impacts the execution time,
- identical jobs runs on different CEs in order to compare their performance,
- combined measurement in order to explore how different sites behave when parameter variation takes place,
- multiple independent partial runs using different inputs.
The easiest way to obtain the initial list of CEs is to filter the output of glite-wms-job-list-match called with JDL where the appropriate requirements are specified (one of them should of course be Member("MPICH", other.GlueHostApplicationSoftwareRunTimeEnvironment) for MPI jobs):
glite-wms-job-list-match -a requirements.jdl | grep seegrid | gawk '{print $2}'
Command “time” in main job script
Actually, the time command and its output written to stderr is the instrument to measure the real, user and system time spent on job execution within output_probe_jobs.sh. A call to time should be located right before the call to the executable in main job script, for example:
time -p mpiexec `pwd`/pbfs.exe
instead of
mpiexec `pwd`/pbfs.exe
It is also mandatory to put the file where stderr stream is written into OutputSandbox tag contained in JDL template file (in this case, job.err – example JDL template given above).
Jobs submission
The first script is to be used for automatic jobs submission. It reads the input (actually, it is redirected from CEs.txt file) and submits jobs with various parameter (numbers of CPUs here) to each CE. The example:
./submit_probe_jobs.sh _NODENUMBER_ < CEs.txt > jobs.txt
means that (according to the content of CEs.txt, above) 2, 4 and 8 CPU jobs are to be submitted to each CE in the list, while the script output is redirected to jobs.txt file. This is an example how this jobs.txt looks like (columns ordered like {CE, JDLPARAMETER, JOBID} ):
cluster1.csk.kg.ac.yu:2119/jobmanager-pbs-seegrid 8 https://wms.phy.bg.ac.yu:9000/2MSDgpCsXUZvlXobKSrEbg cluster1.csk.kg.ac.yu:2119/jobmanager-pbs-seegrid 4 https://wms.phy.bg.ac.yu:9000/lMVIZQWioLCJ21dDSVmU1g cluster1.csk.kg.ac.yu:2119/jobmanager-pbs-seegrid 2 https://wms.phy.bg.ac.yu:9000/IzKF74tCJzoPLNnhZD9rBA ce02.grid.acad.bg:2119/jobmanager-pbs-seegrid 8 https://wms.phy.bg.ac.yu:9000/odHB17uuaNrFz_qD1Tz1Ig ce02.grid.acad.bg:2119/jobmanager-pbs-seegrid 4 https://wms.phy.bg.ac.yu:9000/vOyy2YjnVPewn-L6eZmH0g ce02.grid.acad.bg:2119/jobmanager-pbs-seegrid 2 https://wms.phy.bg.ac.yu:9000/OUV3EUCl4G3OSO6gaoH_NQ
One can check the status of these submitted jobs or cancel all of them with:
glite-wms-job-status `gawk '{ print $3 }' jobs.txt`
glite-wms-job-cancel `gawk '{ print $3 }' jobs.txt`.
When the jobs are finally finished, one can get the output stored on WMS and transfer it to UI. The analysis script presumes the output is stored at its default location /tmp/glite/glite-ui. Thus, the command to collect the output from all the jobs submitted should be:
glite-wms-job-output `gawk '{ print $3 }' jobs.txt`
It can be invoked multiple times until all submitted jobs are finished or until the user gives up further waiting. At this moment, all remaining jobs should be mandatory canceled using above glite-wms-job-cancel command.
Gathering the results and their analysis
At this time, the output of all submitted jobs have been successfully retrieved and stored to its default location on UI. Therefore, the purpose of output_probe_jobs.sh is to collect the outputs from time commands and print them in user friendly way:
./output_probe_jobs.sh _NODENUMBER_ job.err < jobs.txt
where job.err represents the filename where stderr stream is redirected. The resulting table looks like the following:
************************************************************************* CE _NODENUMBER_ REAL USER SYS ************************************************************************* cluster1.csk.kg.ac.yu:2119/jobmanager-pbs-seegrid 8 412.17 115.54 151.13 cluster1.csk.kg.ac.yu:2119/jobmanager-pbs-seegrid 4 681.47 112.53 148.53 cluster1.csk.kg.ac.yu:2119/jobmanager-pbs-seegrid 2 871.69 106.50 140.06 ce02.grid.acad.bg:2119/jobmanager-pbs-seegrid 8 412.17 115.54 151.13 ce02.grid.acad.bg:2119/jobmanager-pbs-seegrid 4 681.47 112.53 148.53 ce02.grid.acad.bg:2119/jobmanager-pbs-seegrid 2 871.69 106.50 140.06 ************************************************************************* Total real time: 3930.66 Total user time: 669.14 Total system time: 879.44
The totals given are useful in case that single jobs were actually the segments (independent partial runs) of a larger task initially defined in CEs.txt. Real time refers to wall clock time while user+sys=CPU time.
Script sources
The sources of the scripts submit_probe_jobs.sh and output_probe_jobs.sh are given below:
submit_probe_jobs.sh
#!/bin/bash
# The script for submitting the series of probe jobs with varying parameter
#
# Example: ./submit_probe_jobs.sh _NUMBEROFCPUS_ my_program_jdl_template < CEs.txt
# Submits the job from file my_program_jdl_template to each CE listed in the first column of
# CEs.txt replacing the variable _NUMBEROFCPUS_ with real parameters listed in the
# second column of CEs.txt
#
# IMPORTANT: JDLTEMPLATE file should be present in the working directory.
# The default file name is "job_jdl_template" if the parameter is empty
# This is an ordinary JDL used to submit the job, but with a variable located
# in the place of any program argument or JDL parameter we wish to vary, in the example:
# "NodeNumber = nn" has been replaced with "NodeNumber = _NODENUMBER_"
#
# Written by Milos Ivanovic, AEGIS04-KG, January 2007,
# for PBFS application performance and scalability test
#
# Test the number of arguments
test $# -lt 1 && \
echo "Usage: `basename $0` NAMEOFJDLVARIABLE my_program_jdl_template < CELISTFILE" && exit
VARIABLE=$1
# Default my_program_jdl_template is job_jdl_template
if [ -z $2 ]
then
JDLTEMPLATEFILE=./job_jdl_template
else
JDLTEMPLATEFILE=$2
fi
# Test if the JDLTEMPLATEFILE exists
test ! -e $JDLTEMPLATEFILE && \
echo "Error: $JDLTEMPLATEFILE does not exist." && exit
# Main loop, each CELINE in input
while read CELINE
do
# CE to read from CELINE (first column)
CE=`echo $CELINE | gawk '{ print $1 }'`
# Real value to read from CELINE (second column)
REALVALUE=`echo $CELINE | gawk '{ print $2 }'`
# Copy template to working jdl file "probe.jdl"
cp $JDLTEMPLATEFILE probe.jdl
# Replace $VARIABLE text with real value
perl -p -i -e "s/$VARIABLE/$REALVALUE/g" probe.jdl
# Get JOBID filtering the output of glite-wms-job-submit
JOBID=`glite-wms-job-submit -r $CE -a probe.jdl | grep 9000`
# Print the job table with coloums $CE, $REALVALUE, $JOBID
echo $CE $REALVALUE $JOBID
done
output_probe_jobs.sh
#!/bin/bash
# The purpose of this script is to analyse the output of the probe jobs and print timing info
# obtained from "time -p" bash command perfomed in main job script
# The script takes the file generated as output of submit_probe_jobs.sh as input
# which should contain the table with columns in the following order: {CE, JDLVARIABLE, JOBID}
#
# Example of usage: ./output_probe_jobs.sh _NODENUMBER_ job.err < jobs.txt
#
# 1. First script argument is a name of JDL variable (look at submit_probe_jobs.sh)
# 2. The argument specified is a file name where job stderr goes (the same file specified in jdl template)
# 3. The job script should contain the call to "time -p" bash command, something like "time -p mpiexec pbfs.exe"
# 4. The script presumes that glite-wms-job-output has been performed for each JOBID
#
# Written by Milos Ivanovic, AEGIS04-KG, January 2007,
# for PBFS application performance and scalability test
#
# Test the number of arguments, should be 2
test $# -ne 2 && \
echo "Usage: `basename $0` NAMEOFJDLVARIABLE STDERRORFILENAME < JOBFILE" && exit
NAMEOFJDLVARIABLE=$1
STDERRFILENAME=$2
SUMREAL=0
SUMUSER=0
SUMSYS=0
echo "*************************************************************************"
echo "CE $1 REAL USER SYS"
echo "*************************************************************************"
# For each line in input formatted as "CE JDLVARIABLE JOBID"
while read JOBLINE
do
# Perform some filters to find a location of job outout on UI
# for example /tmp/glite/glite-ui/milos_hcxOzsk7eQYrVPmFlQZx_A/pbfs.err
OUTPUTFILE="/tmp/glite/glite-ui/`echo $USER`_";
OUTPUTFILE=$OUTPUTFILE`echo $JOBLINE | gawk 'BEGIN {FS="/"} { print $5 }'`
OUTPUTFILE=$OUTPUTFILE"/$STDERRFILENAME"
# In case the output has been taken to UI print the output of "time -p" command
if [ -e $OUTPUTFILE ]
then
REALTIME=`cat $OUTPUTFILE | grep -E "real [0-9]*[,.]?[0-9]" | gawk '{print $2}'`
USERTIME=`cat $OUTPUTFILE | grep -E "user [0-9]*[,.]?[0-9]" | gawk '{print $2}'`
SYSTIME=`cat $OUTPUTFILE | grep -E "sys [0-9]*[,.]?[0-9]" | gawk '{print $2}'`
SUMREAL=`echo "$SUMREAL+$REALTIME" | bc`
SUMUSER=`echo "$SUMUSER+$USERTIME" | bc`
SUMSYS=`echo "$SUMSYS+$SYSTIME" | bc`
else
REALTIME="NOINFO"
USERTIME="NOINFO"
SYSTIME="NOINFO"
fi
# Print the table row "CE JDLVARIABLE REALTIME USERTIME SYSTIME"
echo `echo $JOBLINE | gawk '{print $1 " " $2}'` $REALTIME $USERTIME $SYSTIME
done
# Print totals
echo "*************************************************************************"
echo "Total real time: $SUMREAL"
echo "Total user time: $SUMUSER"
echo "Total system time: $SUMSYS"
