Parametric Jobs

From EGEE-see WIki

Jump to: navigation, search

For basic quick guide on submitting jobs refer to this page

Parametric job - general description

When you want to run similar jobs that only differs in arguments or input/output files, it is best to use parametric job type cause submitting them separately can take much of your time. Parametric job type allows you to submit bulk of jobs as a single job, and then WMS takes over, break your parametric job into many single jobs and submit them separately to CEs on your behalf, thus saving you a lot of time. Upon submission, every sub-job will be associated with an individual identifier (job ID), and beside that a common job ID will be assigned to the whole set of jobs. This common id is used to list status or retrieve output of all jobs at once. When you submit a parametric job, you will be given a common id (printed on screen or to the specified file). Individual job IDs can be seen when you ask for status of your parametric job (glite-wms-job-status <common_jobID>).

Parametric job is described by a single JDL, where attribute values may contain the current value of the running parameter (_PARAM_). An example of a JDL for a parametric job follows:

[
JobType = "Parametric";
Executable = "myjob.exe";
StdInput = "input_PARAM_.txt";
StdOutput = "output_PARAM_.txt";
StdError = "error_PARAM_.txt";
Parameters = 100;
ParameterStart = 1;
ParameterStep = 1;
InputSandbox = {"myjob.exe", "input_PARAM_.txt"};
OutputSandbox = {"output_PARAM_.txt", "error_PARAM_.txt"};
]

Value _PARAM_ is a variable that will take values from ParameterStart up to Parameters with incremental step ParameterStep.

Attribute Parameters can also be a list of items (typically strings not enclosed within double quotes). Variable _PARAM_ will then take values from this list, and attributes ParameterStart and ParameterStep shouldn't be set.

For example, if Parameters is set like

Parameters = {red, green, blue};

three jobs will be submitted with input files inputred.txt, inputgreen.txt and inputblue.txt respectively.

Accessing output of individual jobs

Status of parametric job will be Done only when every individual job is terminated, and only then you can retrieve output of all jobs. On the other hand, you can access output of particular sub-job by using its individual ID. Unlike with regular single jobs, after output retrieval of individual job (which belongs to set of parametric jobs), status will not be set to Cleared, and you'll be able to download this output again. Status of jobs is set to Cleared only upon retrieval of all outputs by using common id.

Recipe for submitting large number of jobs with different input arguments

If your executable takes more than one argument and you need to submit large number of jobs that take different arguments, here is a small recipe of how to do it.

For example, program my_run takes arguments startpoint and endpoint of double type. You want to run it for startpoint equal 0 and 0.5, and endpoint equals 1 and 1.5. Put all your arguments in one text file (arguments.txt) so that each line contains arguments for single run, like this

0 1
0 1.5
0.5 1
0.5 1.5

Make small job wrapper (mywrapper.sh) like this:

#!/bin/bash
# Usage: wrapper.sh <executable> <arguments_file> <line_number>

exec_file=$1
arg_file=$2
N=$3 
arguments=$(head -n $N $arg_file | tail -n 1) # extracts $N-th line from file $arg_file
chmod +x $exec_file #sets executable permission on file $exec_file
./$exec_file $arguments #executes $exec_file with arguments from the $N-th line

Prepare your parametric jdl (this is just an example, you should customize it to your needs)

[
JobType = "Parametric";
Executable = "mywrapper.sh";
Arguments = "my_run arguments.txt _PARAM_";
StdOutput = "stdout._PARAM_.txt";
StdError = "stderr._PARAM_.txt";

Parameters = 5;
ParameterStart = 1;
ParameterStep = 1;

InputSandbox = {"mywrapper.sh", "my_run", "arguments.txt"};
OutputSandbox = {"stdout._PARAM_.txt", "stderr._PARAM_.txt"};
]

This will submit four jobs and variable _PARAM_ will take values from 1 to 4. For each value of param, the script wrapper.sh will be executed which does the following:

  • extract line with number _PARAM_ from file arguments.txt,
  • execute my_run using previously extracted line as arguments

Attention: Take care that the number you set for Parameters match the total number of lines in arguments.txt file plus one, and that the file does not contain blank lines! Otherwise some of the jobs might fail due to missing arguments, or some of argument lines would not be processed, or last line would be processed several times.

Personal tools