SG Running Jobs WMProxy CLI
From EGEE-see WIki
Submission and monitoring of jobs via WMProxy using the command line interface
Here you can find an example of a sequence of steps that have to be performed to do a job submission and to monitor the submitted job. You can use two types of WMS-UI commands. One type is used when the submission is performed via Network Server and the other one when the WMProxy is used (here presented). This guide is prepared by AEGIS01-PHY-SCL site admins.
WMProxy is component of gLite Workload Management System (WMS) that is responsible for accepting incoming requests from the User Interface (WMS-UI) (e.g. job submission, job removal), which, if valid, are then passed to the other components of WMS. It provides support for the job control functionality through a Web Services based interface. Besides being the natural replacement of the Network Server in the passage to the SOA approach for the WMS architecture, it provides additional features such as bulk submission and the support for shared and compressed sandboxes for compound jobs. Here we describe how to submit and monitor jobs through this service.
Before using any of the WMS-UI commands it is necessary to have a valid proxy credential available on the WMS-UI machine. You can create it using the voms-proxy-init command. Make sure that you have your certificate/key pair in directory $HOME/.globus:
-rw-r--r-- 1 neda neda 5015 Jun 7 13:10 usercert.pem -r-------- 1 neda neda 963 Jun 7 13:10 userkey.pem
Note that file permissions are important.
Then you can issue the VOMS client command. You will be prompted for the pass-phrase.
[neda@ce neda]$ voms-proxy-init -voms aegis Enter GRID pass phrase: Your identity: /C=RS/O=AEGIS/OU=Institute of Physics Belgrade/CN=Neda Svraka Cannot find file or dir: /home/neda/.glite/vomses Creating temporary proxy ................................ Done Contacting se.phy.bg.ac.yu:15001 [/DC=ORG/DC=SEE-GRID/O=Hosts/O=Institute of Physics Belgrade/CN=host/se.phy.bg.ac.yu] "aegis" Done Creating proxy ...................................................................... Done Your proxy is valid until Thu Jul 26 01:36:52 2007
Proxy created in this way contains attributes retrieved from a VOMS server.
By default validity of such certificate is set to 12 hours. Longer validity must be explicitly requested by user when generating proxy, but is limited to max. 24 hours due to security reason. If job will take more time to complete (waiting time included!), then MyProxy must be used so that the proxy can be renewed by the WMS.
Information of existing a valid proxy-certificate can be obtained using command voms-proxy-info. Two useful options are -all, which prints everything, and -fqan, which prints the groups and roles in FQAN format. For example simple usage of command looks like this:
[neda@ce neda]$ voms-proxy-info subject : /C=RS/O=AEGIS/OU=Institute of Physics Belgrade/CN=Neda Svraka/CN=proxy issuer : /C=RS/O=AEGIS/OU=Institute of Physics Belgrade/CN=Neda Svraka identity : /C=RS/O=AEGIS/OU=Institute of Physics Belgrade/CN=Neda Svraka type : proxy strength : 512 bits path : /tmp/x509up_u35003 timeleft : 11:56:26
Now you can use WMS-UI commands.
The key element of search for resources that are match job requests is JDL (Job Description Language) file. It contains attributes for the description of a request, that are taken into account by the Workload Management System (WMS) components in order to schedule and submit a job or the jobs of a complex request. Simple JDL file, hello.jdl, is used here. Executable is hostname command which returns the name of working node (WN) on which job is executed. Result is published in generated output file message.txt and possible errors are printed in generated file stderror. Attributes StdOutput and StdError define to which files standard output and standard error of your job will be redirected. The attribute OutputSandbox defines which files should be returned to your UI when the jobs is completed and its results retrieved. Another possible attribute is InputSandbox which can define the files that should be transfered from the UI to the WN, and which are necessary for the execution of the job. For example, if the executable is not a command of the operating system (usually a custom script is defined), then such a script and all accompanying files necessary for its execution must be specified in InputSandbox attribute.
[
Type = "Job";
Executable = "/bin/hostname";
Arguments = "";
StdOutput = "message.txt";
StdError = "stderror";
OutputSandbox = {"message.txt","stderror"};
]
Sequence of steps that you have to perform in order to submit job and monitor its execution via WMProxy is pretty similar to one shown when Network Server was used. Set of commands is extended with new ones that should support new features invoked with WMProxy service. These new things will be emphasized here.
Each job submitted to WMProxy has to be associated with a credential previously delegated by user who submitted the job. This credential is used to perform job related operations when interaction with other services is requested. Delegation can be performed automatically or explicitly. In the first case you ask automatic delegation during submission (glite-wms-job-submit with option -a) or during listing adequate CEs (glite-wms-job-list-match with option -a). This is not recommended because in this way you have to delegate a proxy for every job you want to submit and this operation is time consuming. It is better to delegate a proxy explicitly (glite-wms-job-delegate-proxy), which can be used for multiple job submissions.
[neda@ce neda]$ glite-wms-job-delegate-proxy -d dID Connecting to the service https://wms.phy.bg.ac.yu:7443/glite_wms_wmproxy_server ================== glite-wms-job-delegate-proxy Success ================== Your proxy has been successfully delegated to the WMProxy: https://wms.phy.bg.ac.yu:7443/glite_wms_wmproxy_server with the delegation identifier: dID ==========================================================================
Later you should use delegation identifier (here: dID) with commands glite-wms-job-list-match, whose output is list CEs that match job requirements, and glite-wms-job-submit, used for submission of a job. In both case you use option -d with delegation identifier.
Before actually submitting a job, it is often useful to see a list of resources which are suitable to run a specific job (fulfill requirements specified in a JDL file). This can also be used as a check of syntactical correctness of JDL file, and as a check if the job can actually be executed on a given infrastructure. A list of resources satisfying your JDL requirements can be obtained using the glite-job-list-match command, with JDL job description file used as input argument. Resources are listed according to their corresponding ranks. First in the list is one with highest rank.
[neda@ce neda]$ glite-wms-job-list-match -d dID hello.jdl Connecting to the service https://wms.phy.bg.ac.yu:7443/glite_wms_wmproxy_server ========================================================================== COMPUTING ELEMENT IDs LIST The following CE(s) matching your job requirements have been found: *CEId* - ce.phy.bg.ac.yu:2119/jobmanager-pbs-aegis - cluster1.csk.kg.ac.yu:2119/jobmanager-pbs-aegis - g02.phy.bg.ac.yu:2119/blah-pbs-aegis - grid01.elfak.ni.ac.yu:2119/blah-pbs-aegis - grid01.rcub.bg.ac.yu:2119/jobmanager-pbs-aegis - gw01.rogrid.pub.ro:2119/jobmanager-lcgpbs-aegis - rti29.etf.bg.ac.yu:2119/jobmanager-pbs-aegis - seegrid2.fie.upt.al:2119/jobmanager-pbs-aegis - grid01.elfak.ni.ac.yu:2119/jobmanager-pbs-aegis ===========================================================================
Now, if you are satisfied with the list of available Computing Elements (CEs), you can submit your job by using command glite-wms-job-submit ) which requires a JDL file as input:
[neda@ce neda]$ glite-wms-job-submit -d dID -o jid hello.jdl Connecting to the service https://wms.phy.bg.ac.yu:7443/glite_wms_wmproxy_server ====================== glite-wms-job-submit Success ====================== The job has been successfully submitted to the WMProxy Your job identifier is: https://wms.phy.bg.ac.yu:9000/w-sbZQIvNl1tQLVVwgqW_Q The job identifier has been saved in the following file: /home/neda/jid ==========================================================================
If command was successfully executed it returns unique job identifier which is assigned to job by WMS:
https://wms.phy.bg.ac.yu:9000/w-sbZQIvNl1tQLVVwgqW_Q
Job identifier (JID) is used as input argument for commands related to monitoring of job execution and collecting job's results. JID consists of two parts. The first part is endpoint URL of the LB server which holds logging and bookkeeping information of job ('''https://wms.phy.bg.ac.yu:9000''') and second one, which is generated by WMS-UI (w-sbZQIvNl1tQLVVwgqW_Q), uniquely defining the job in grid environment. In the above example, -o option was used which enables that JID to be stored in desired file, e.g. file jid in directory /home/neda.
You can check the state of your job by using glite-wms-job-status command. Option -i makes possible monitoring more than one job by choosing one of the offered options. This is the case when the JIDs are stored in the same file (file jid in /home/neda).
[neda@ce neda]$ glite-wms-job-status -i jid ------------------------------------------------------------------ 1 : https://wms.phy.bg.ac.yu:9000/TOPdK-V6gh28DP8gHPpR5A 2 : https://wms.phy.bg.ac.yu:9000/w-sbZQIvNl1tQLVVwgqW_Q a : all q : quit ------------------------------------------------------------------ Choose one or more jobId(s) in the list - [1-2]all:2 ************************************************************* BOOKKEEPING INFORMATION: Status info for the Job : https://wms.phy.bg.ac.yu:9000/w-sbZQIvNl1tQLVVwgqW_Q Current Status: Done (Success) Exit code: 0 Status Reason: Job terminated successfully Destination: rti29.etf.bg.ac.yu:2119/jobmanager-pbs-aegis Submitted: Wed Dec 20 17:35:47 2006 CET *************************************************************
States that represent the normal job live span are: Submitted, Waiting, Ready, Scheduled, Running and Done. Aborted state represents irregularity in jobs processing. After the job is successfully executed (state Done) you can retrieve the output of your job to the UI machine (i.e. files specified in the OutputSandbox attribute in the JDL file) by passing JID to glite-wmsjob-output command. Option -dir is used which allows you to store output files at desired location (here: /home/neda/izlaz).
[neda@ce neda]$ glite-wms-job-output --dir /home/neda/izlaz -i jid
------------------------------------------------------------------
1 : https://wms.phy.bg.ac.yu:9000/TOPdK-V6gh28DP8gHPpR5A
2 : https://wms.phy.bg.ac.yu:9000/w-sbZQIvNl1tQLVVwgqW_Q
a : all
q : quit
------------------------------------------------------------------
Choose one or more jobId(s) in the list - [1-2]all (use , as separator or - for a range): 2
Connecting to the service https://147.91.84.25:7443/glite_wms_wmproxy_server
===========================================================================
JOB GET OUTPUT OUTCOME
Output sandbox files for the job:
https://wms.phy.bg.ac.yu:9000/w-sbZQIvNl1tQLVVwgqW_Q
have been successfully retrieved and stored in the directory:
/home/neda/izlaz
===========================================================================
You can see that output files are in desired directory:
[neda@ce neda]$ ll izlaz total 4 -rw-rw-r-- 1 neda neda 19 Dec 20 18:04 message.txt -rw-rw-r-- 1 neda neda 0 Dec 20 18:04 stderror
As in the case of submission and monitoring via Network Server you can get more information about events related to job by using glite-wms-job-logging-info with JID as input argument. This logging information (usually asked in a more verbous version, using the option -v 3 or -v 2) can serve as a starting point when there are problems with the execution of a job. The logging information is also a necessary input when you submit a ticket in the SEE-GRID Helpdesk, asking for support for any problems you may experience with your job execution. Also you can cancel the job with glite-wms-job-cancel command with JID as input argument.
It is worth to mention that submission of complex job can be done in the same way as the submission of simple job when the description of job and its requirements is put in one JDL file, which is the case almost always. Exception can be made with collections of independent jobs, when you can give a path to directory where the single job's JDLs are. For accomplishing this you should use option --collection with glite-wms-job-submit command. Note: you can not look for list of matching CEs for complex jobs. In the example shown below there your see submission of collection of five independent jobs located in /home/neda/jobs directory:
[neda@ce neda]$ ll /home/neda/jobs total 20 -rw-r--r-- 1 neda neda 154 Jul 5 13:02 hello0.jdl -rw-r--r-- 1 neda neda 154 Jul 5 13:02 hello1.jdl -rw-r--r-- 1 neda neda 154 Jul 5 13:02 hello2.jdl -rw-r--r-- 1 neda neda 154 Jul 5 13:03 hello3.jdl -rw-r--r-- 1 neda neda 154 Jul 5 13:03 hello4.jdl
[neda@ce neda]$ glite-wms-job-submit -d dID -o jobsId --collection /home/neda/jobs Connecting to the service https://wms.phy.bg.ac.yu:7443/glite_wms_wmproxy_server ====================== glite-wms-job-submit Success ====================== The job has been successfully submitted to the WMProxy Your job identifier is: https://wms.phy.bg.ac.yu:9000/80tjbwwnCBFZYFJQXJvIoA The job identifier has been saved in the following file: /home/neda/jobsId ==========================================================================
Procedure for monitoring of job status is same as for simple jobs, usage of glite-wms-job-status command with JID. Here JID can be job identifier of whole collection or one of independent jobs. As you can see below every job in collection has its own job identifier. Status of collection:
[neda@ce neda]$ glite-wms-job-status -i jobsId
*************************************************************
BOOKKEEPING INFORMATION:
Status info for the Job : https://wms.phy.bg.ac.yu:9000/80tjbwwnCBFZYFJQXJvIoA
Current Status: Running
Status Reason: unavailable
Destination: dagman
Submitted: Thu Jul 5 13:16:14 2007 CEST
*************************************************************
- Nodes information:
Status info for the Job : https://wms.phy.bg.ac.yu:9000/wJ134KOfOXLTjUS45sQTrQ
Node Name: hello4_jdl
Current Status: Scheduled
Status Reason: Job successfully submitted to Globus
Destination: cluster1.csk.kg.ac.yu:2119/jobmanager-pbs-aegis
Submitted: Thu Jul 5 13:16:14 2007 CEST
*************************************************************
Status info for the Job : https://wms.phy.bg.ac.yu:9000/oeJpPuRjhylbnQpPt_30XQ
Node Name: hello2_jdl
Current Status: Scheduled
Status Reason: Job successfully submitted to Globus
Destination: cluster1.csk.kg.ac.yu:2119/jobmanager-pbs-aegis
Submitted: Thu Jul 5 13:16:14 2007 CEST
*************************************************************
Status info for the Job : https://wms.phy.bg.ac.yu:9000/KaiWJRW_icZxQI8tUm7vWQ
Node Name: hello0_jdl
Current Status: Scheduled
Status Reason: Job successfully submitted to Globus
Destination: cluster1.csk.kg.ac.yu:2119/jobmanager-pbs-aegis
Submitted: Thu Jul 5 13:16:14 2007 CEST
*************************************************************
Status info for the Job : https://wms.phy.bg.ac.yu:9000/f9o8iVqGMQpzJEQpLnhqEA
Node Name: hello3_jdl
Current Status: Scheduled
Status Reason: Job successfully submitted to Globus
Destination: cluster1.csk.kg.ac.yu:2119/jobmanager-pbs-aegis
Submitted: Thu Jul 5 13:16:14 2007 CEST
*************************************************************
Status info for the Job : https://wms.phy.bg.ac.yu:9000/LufprwQIcB7NAZEV9JrcCw
Node Name: hello1_jdl
Current Status: Scheduled
Status Reason: Job successfully submitted to Globus
Destination: cluster1.csk.kg.ac.yu:2119/jobmanager-pbs-aegis
Submitted: Thu Jul 5 13:16:14 2007 CEST
*************************************************************
or status of one node of collection:
[neda@ce neda]$ glite-wms-job-status https://wms.phy.bg.ac.yu:9000/LufprwQIcB7NAZEV9JrcCw ************************************************************* BOOKKEEPING INFORMATION: Status info for the Job : https://wms.phy.bg.ac.yu:9000/LufprwQIcB7NAZEV9JrcCw Current Status: Scheduled Status Reason: Job successfully submitted to Globus Destination: cluster1.csk.kg.ac.yu:2119/jobmanager-pbs-aegis Submitted: Thu Jul 5 13:16:14 2007 CEST Parent Job: https://wms.phy.bg.ac.yu:9000/80tjbwwnCBFZYFJQXJvIoA *************************************************************
More information about events related to job you can get with glite-wms-job-logging-info command:
[neda@ce neda]$ glite-wms-job-logging-info -i jobsId
where jobsId is file where the job identifier of collection is stored.
You can retrieve results of collection of jobs by using glite-wms-job-output command with global job identifier:
[neda@ce neda]$ glite-wms-job-output -i jobsId
Of course job has to be in state Done. Results of single jobs in collection will be stored in separate subdirectories.
NOTE: More information on this topics you can find in following documents:
- F. Pacini, Job Description Language (JDL) Attributes Specification
