SG Running Jobs NS CLI
From EGEE-see WIki
INSTRUCTIONS ON THIS PAGE ARE OBSOLETED AND NO LONGER SUPPORTED
Submission and monitoring of jobs via Network Server using the command line interface
Here you can find instructions on a sequence of steps that have to be performed to do a job submission to the Network Server and to monitor the submitted job using the command line interface. You can use two types of UI commands related to this. One type is used when the submission is performed via Network Server (here presented) and the other one when the WMProxy service is used. This guide is prepared by AEGIS01-PHY-SCL site admins of the Institute of Physics Belgrade.
Network Server is component of Workload Management System that is responsible for accepting incoming requests from the User Interface (WMS-UI) (e.g. job submission, job removal), which, if valid, are then passed to the other components of WMS. It provides support for the job control functionality for the users that use either LCG Resource Broker (LCG-RB) or gLite WMS (WMS). Here we describe how to submit and monitor jobs for both services.
Before using any of the job-related commands, it is necessary to have a valid proxy credential available on the UI machine. You can create it using the voms-proxy-init command or alternatively the grid-proxy-init one. Make sure that you have your certificate/key pair in directory $HOME/.globus:
-rw-r--r-- 1 neda neda 5015 Jun 7 13:10 usercert.pem -r-------- 1 neda neda 963 Jun 7 13:10 userkey.pem
Note that ï¬le permissions are important.
Then you can issue the VOMS client command. You will be prompted for the pass-phrase.
[neda@ce neda]$ voms-proxy-init -voms aegis Enter GRID pass phrase: Your identity: /C=RS/O=AEGIS/OU=Institute of Physics Belgrade/CN=Neda Svraka Cannot find file or dir: /home/neda/.glite/vomses Creating temporary proxy ................................ Done Contacting se.phy.bg.ac.yu:15001 [/DC=ORG/DC=SEE-GRID/O=Hosts/O=Institute of Physics Belgrade/CN=host/se.phy.bg.ac.yu] "aegis" Done Creating proxy ...................................................................... Done Your proxy is valid until Thu Jul 26 01:36:52 2007
Proxy created in this way contains attributes retrieved from a VOMS server.
By default validity of such certificate is set to 12 hours. Longer validity must be explicitly requested by user when generating proxy , but is limited to max. 24 hours due to security reason. If job will take more time to complete (waiting time included!), then MyProxy must be used so that the proxy can be renewed by the RB/WMS.
Information on existing valid proxy-certificate can be obtained using command voms-proxy-info. Two useful options are -all, which prints everything, and -fqan, which prints the groups and roles in FQAN format. For example simple usage of command looks like this:
[neda@ce neda]$ voms-proxy-info subject : /C=RS/O=AEGIS/OU=Institute of Physics Belgrade/CN=Neda Svraka/CN=proxy issuer : /C=RS/O=AEGIS/OU=Institute of Physics Belgrade/CN=Neda Svraka identity : /C=RS/O=AEGIS/OU=Institute of Physics Belgrade/CN=Neda Svraka type : proxy strength : 512 bits path : /tmp/x509up_u35003 timeleft : 11:56:26
Alternatively, grid-proxy-init command can be used which is equal to usage voms-proxy-init without -voms option, which means that proxy without attributes is created. In this case user can not be a member of two VOs, because she/he is always mapped on the one which is appeared first in grid-mapfile. Usage of voms-proxy-init allows user to be member of more than one VO and to be accurate mapped to the one of them.
[neda@ce neda]$ grid-proxy-init Your identity: /C=RS/O=AEGIS/OU=Institute of Physics Belgrade/CN=Neda Svraka Enter GRID pass phrase for this identity: Creating proxy ..................................... Done Your proxy is valid until: Thu Jul 26 02:21:27 2007
With grid-proxy-info you can obtain information about your created certificate:
[neda@ce neda]$ grid-proxy-info subject : /C=RS/O=AEGIS/OU=Institute of Physics Belgrade/CN=Neda Svraka/CN=proxy issuer : /C=RS/O=AEGIS/OU=Institute of Physics Belgrade/CN=Neda Svraka identity : /C=RS/O=AEGIS/OU=Institute of Physics Belgrade/CN=Neda Svraka type : full legacy globus proxy strength : 512 bits path : /tmp/x509up_u35003 timeleft : 11:57:48
Important difference between grid-proxy and voms-proxy is that voms-proxy contains voms attributes, specifying which Virtual Organization (VO) you belong to, and therefore when issuing job management commands you will not be asked to provide this information. With plain grid-proxies, however, you will have to explicitly specify VO you are using (usually through --vo option).
Now you can use WMS-UI commands.
The key element for a search for resources that are matching your job requirements is JDL (Job Description Language) file. It contains attributes for the description of job requirements that are taken into account by the Workload Management System (RB/WMS) components in order to schedule and submit a job (or all sub-jobs in the case of a complex request, such as DAG job). Simple JDL file, hello.jdl, is used here. Executable is hostname command which returns the name of worker node (WN) on which job is executed. Result of the command will be stored in a generated output file message.txt and possible errors are printed in a generated file stderror. Attributes StdOutput and StdError define to which files standard output and standard error of your job will be redirected. The attribute OutputSandbox defines which files should be returned to your UI when the jobs is completed and its results retrieved. Another possible attribute is InputSandbox which can define the files that should be transfered from the UI to the WN, and which are necessary for the execution of the job. For example, if the executable is not a command of the operating system (usually a custom script is defined), then such a script and all accompanying files necessary for its execution must be specified in InputSandbox attribute.
[
Type = "Job";
Executable = "/bin/hostname";
Arguments = "";
StdOutput = "message.txt";
StdError = "stderror";
OutputSandbox = {"message.txt","stderror"};
]
As it has been said above, Network Server provides support for the job control functionality for the users that use either LCG Resource Broker or gLite WMS. Commands that allow job submission, monitoring and control are shown below. Those ones that start with glite- prefix are used with gLite WMS, while those that have edg- prefix refer to LCG Resource Broker usage.
Before actually submitting a job, it is often useful to see a list of resources which are suitable to run a specific job (fulfill requirements specified in a JDL file). This can also be used as a check of syntactical correctness of JDL file, and as a check if the job can actually be executed on a given infrastructure. A list of resources satisfying your JDL requirements can be obtained using the glite-job-list-match command, with JDL job description file used as input argument. Resources are listed according to their corresponding ranks. First in the list is one with highest rank.
[neda@ce neda]$ glite-job-list-match hello.jdl
Selected Virtual Organisation name (from proxy certificate extension): aegis
Connecting to host wms.phy.bg.ac.yu, port 7772
**********************************************************************
COMPUTING ELEMENT IDs LIST
The following CE(s) matching your job requirements have been found:
*CEId*
ce.phy.bg.ac.yu:2119/jobmanager-pbs-aegis
cluster1.csk.kg.ac.yu:2119/jobmanager-pbs-aegis
g02.phy.bg.ac.yu:2119/blah-pbs-aegis
grid01.elfak.ni.ac.yu:2119/blah-pbs-aegis
grid01.rcub.bg.ac.yu:2119/jobmanager-pbs-aegis
gw01.rogrid.pub.ro:2119/jobmanager-lcgpbs-aegis
rti29.etf.bg.ac.yu:2119/jobmanager-pbs-aegis
seegrid2.fie.upt.al:2119/jobmanager-pbs-aegis
grid01.elfak.ni.ac.yu:2119/jobmanager-pbs-aegis
**********************************************************************
For LCG RB you can use similarly edg-job-list-match command. If you use plain grid-proxy (i.e. your proxy does not have any reference to Virtual Organization you belong to), then --vo option must be present (for both glite-job-list-match and edg-job-list-match commands), and it must provide the name of your Virtual Organization (aegis in this case).
[neda@ce neda]$ edg-job-list-match --vo aegis hello.jdl
Selected Virtual Organisation name (from --vo option): aegis
Connecting to host rb.phy.bg.ac.yu, port 7772
***************************************************************************
COMPUTING ELEMENT IDs LIST
The following CE(s) matching your job requirements have been found:
*CEId*
cluster1.csk.kg.ac.yu:2119/jobmanager-pbs-aegis
grid01.elfak.ni.ac.yu:2119/jobmanager-pbs-aegis
grid01.rcub.bg.ac.yu:2119/jobmanager-pbs-aegis
rti29.etf.bg.ac.yu:2119/jobmanager-pbs-aegis
ce.phy.bg.ac.yu:2119/jobmanager-pbs-aegis
***************************************************************************
Now, if you are satisfied with the list of available Computing Elements (CEs), you can submit your job by using command glite-job-submit (edg-job-submit, when the LCG RB is used) which requires a JDL file as input:
[neda@ce neda]$ glite-job-submit -o jid hello.jdl Selected Virtual Organisation name (from proxy certificate extension): aegis Connecting to host wms.phy.bg.ac.yu, port 7772 Logging to host wms.phy.bg.ac.yu, port 9002 ===============================glite-job-submit Success========================= The job has been successfully submitted to the Network Server. Use glite-job-status command to check job current status. Your job identifier is: - https://wms.phy.bg.ac.yu:9000/TOPdK-V6gh28DP8gHPpR5A The job identifier has been saved in the following file: /home/neda/jid =============================================================================
or
[neda@ce neda]$ edg-job-submit --vo aegis hello.jdl
Selected Virtual Organisation name (from --vo option): aegis
Connecting to host rb.phy.bg.ac.yu, port 7772
Logging to host rb.phy.bg.ac.yu, port 9002
*********************************************************************************************
JOB SUBMIT OUTCOME
The job has been successfully submitted to the Network Server.
Use edg-job-status command to check job current status. Your job identifier (edg_jobId) is:
- https://rb.phy.bg.ac.yu:9000/RqSaPgnFMC0sYizcQ9PdUg
*********************************************************************************************
If command was successfully executed it returns unique job identifier which is assigned to job by WMS:
https://wms.phy.bg.ac.yu:9000/TOPdK-V6gh28DP8gHPpR5A
or in the other case:
https://rb.phy.bg.ac.yu:9000/RqSaPgnFMC0sYizcQ9PdUg
Job identifier (JID) is used as input argument for commands related to monitoring of job execution and collecting job's results. JID consists of two parts. The first part is endpoint URL of the LB server which holds logging and bookkeeping information of job ('''https://wms.phy.bg.ac.yu:9000''') and second one, which is generated by WMS-UI (TOPdK-V6gh28DP8gHPpR5A), uniquely defining the job in grid environment. In the above WMS example, -o option was used which enables that JID to be stored in desired file, e.g. file jid in directory /home/neda.
You can monitor job status by sending query to LB service. This can be done by using glite-job-status (edg-job-status, when the LCG RB is used) command which uses JID as input. Option -i makes possible monitoring more than one job by choosing one of the offered options. This is the case when the JIDs are stored in the same file (file jid in /home/neda).
[neda@ce neda]$ glite-job-status -i jid ------------------------------------------------------------------ 1 : https://wms.phy.bg.ac.yu:9000/TOPdK-V6gh28DP8gHPpR5A 2 : https://wms.phy.bg.ac.yu:9000/w-sbZQIvNl1tQLVVwgqW_Q a : all q : quit ------------------------------------------------------------------ Choose one or more jobId(s) in the list – [1-2]all:1 ************************************************************* BOOKKEEPING INFORMATION: Status info for the Job : https://wms.phy.bg.ac.yu:9000/TOPdK-V6gh28DP8gHPpR5A Current Status: Running Status Reason: unavailable Destination: rti29.etf.bg.ac.yu:2119/jobmanager-pbs-aegis Submitted: Wed Dec 20 16:58:13 2006 CET *************************************************************
or
[neda@ce neda]$ edg-job-status https://rb.phy.bg.ac.yu:9000/RqSaPgnFMC0sYizcQ9PdUg ************************************************************* BOOKKEEPING INFORMATION: Status info for the Job : https://rb.phy.bg.ac.yu:9000/RqSaPgnFMC0sYizcQ9PdUg Current Status: Done (Success) Exit code: 0 Status Reason: Job terminated successfully Destination: cluster1.csk.kg.ac.yu:2119/jobmanager-pbs-aegis reached on: Wed Jul 25 13:10:30 2007 *************************************************************
States that represent the normal job live span are: Submitted, Waiting, Ready, Scheduled, Running and Done. Aborted state represents irregularity in jobs processing. After the job is successfully executed (state Done) you can retrieve the output of your job to the UI machine (i.e. files specified in the OutputSandbox attribute in the JDL file) by passing JID to glite-job-output (edg-job-get-output, when the LCG RB is used) command:
[neda@ce neda]$ glite-job-output -i jid
------------------------------------------------------------------
1 : https://wms.phy.bg.ac.yu:9000/TOPdK-V6gh28DP8gHPpR5A
2 : https://wms.phy.bg.ac.yu:9000/w-sbZQIvNl1tQLVVwgqW_Q
a : all
q : quit
------------------------------------------------------------------
Choose one or more jobId(s) in the list – [1-2]all:1
Retrieving files from host: wms.phy.bg.ac.yu ( for https://wms.phy.bg.ac.yu:9000/TOPdK-V6gh28DP8gHPpR5A )
**************************************************************
JOB GET OUTPUT OUTCOME
Output sandbox files for the job:
1. https://wms.phy.bg.ac.yu:9000/TOPdK-V6gh28DP8gHPpR5A
have been successfully retrieved and stored in the directory:
/tmp/neda_TOPdK-V6gh28DP8gHPpR5A
**************************************************************
In the case of LCG RB, edg-job-get-output performs the same action:
[neda@ce neda]$ edg-job-get-output -dir /home/neda/edg_output https://rb.phy.bg.ac.yu:9000/RqSaPgnFMC0sYizcQ9PdUg
Retrieving files from host: rb.phy.bg.ac.yu ( for https://rb.phy.bg.ac.yu:9000/RqSaPgnFMC0sYizcQ9PdUg )
*********************************************************************************
JOB GET OUTPUT OUTCOME
Output sandbox files for the job:
- https://rb.phy.bg.ac.yu:9000/RqSaPgnFMC0sYizcQ9PdUg
have been successfully retrieved and stored in the directory:
/home/neda/edg_output/neda_RqSaPgnFMC0sYizcQ9PdUg
*********************************************************************************
Output files can be stored either in temporary storage set in UI-wide configuration file or in user defined directory. When the glite-job-output is used, output files are stored in the directory neda_TOPdK-V6gh28DP8gHPpR5A, whose name is made of your current username and part of JID which uniquely identifies the job, and it is stored in /tmp because this is the output storage path set in the conï¬guration ï¬le. Usage of option -dir allows storing results in the user defined directory (e.g. /home/neda/edg_output), which is demonstrated when edg-job-get-output is used.
More information about events related to job execution can be obtained by using glite-job-logging-info command, which uses JID as input:
[neda@ce neda]$ glite-job-logging-info https://wms.phy.bg.ac.yu:9000/TOPdK-6gh28DP8gHPpR5A
or
[neda@ce neda]$ glite-job-logging-info -i jid ------------------------------------------------------------------ 1 : https://wms.phy.bg.ac.yu:9000/TOPdK-V6gh28DP8gHPpR5A 2 : https://wms.phy.bg.ac.yu:9000/w-sbZQIvNl1tQLVVwgqW_Q a : all q : quit ------------------------------------------------------------------ Choose one or more jobId(s) in the list – [1-2]all:1
The same can be achieved with edg-job-get-logging-info when LCG Resource Broker is used. This logging information (usually asked in a more verbous version, using the option -v 3 or -v 2) can serve as a starting point when there are problems with the execution of a job. The logging information is also a necessary input when you submit a ticket in the SEE-GRID Helpdesk, asking for support for any problems you may experience with your job execution.
Another important job management command allows a previously submitted job to be canceled (glite-job-cancel/edg-job-cancel):
[neda@ce neda]$ glite-job-cancel https://wms.phy.bg.ac.yu:9000/TOPdK-V6gh28DP8gHPpR5A
or
[neda@ce neda]$ glite-job-cancel -i jid ------------------------------------------------------------------ 1 : https://wms.phy.bg.ac.yu:9000/TOPdK-V6gh28DP8gHPpR5A 2 : https://wms.phy.bg.ac.yu:9000/w-sbZQIvNl1tQLVVwgqW_Q a : all q : quit ------------------------------------------------------------------ Choose one or more jobId(s) in the list – [1-2]all:1
The same can be achieved with edg-job-cancel when LCG Resource Broker is used.
NOTE: More information on job management can be found in the following documents:
