SG Running Jobs WMProxy CLI

From EGEE-see WIki

Jump to: navigation, search

This guide is a part of SEE-GRID Gridification Guide.

Submission and monitoring of jobs via WMProxy using the command line interface


Here you can find an example of a sequence of steps that have to be performed to do a job submission and to monitor the submitted job. This guide is prepared by AEGIS01-IPB-SCL site admins.

WMProxy is component of gLite Workload Management System (WMS) that is responsible for accepting incoming requests from the User Interface (glite-UI) (e.g. job submission, job removal), which, if valid, are then passed to the other components of WMS. It provides support for the job control functionality through a Web Services based interface. It provides features such as bulk submission and the support for shared and compressed sandboxes for compound jobs, as well as the realtime job output monitoring. Here we describe how to submit and monitor jobs through this service.

Contents


Acquiring grid credentials

Certification and Authorization

In order to use Grid resources, first you need to have a valid certificate and become a member of one or more virtual organizations (VO). If you are already a member of VO and you posses valid certificate, proceed to the next section.

For obtaining certificate contact your local Certification Authority (CA). List of SEE-GRID CAs can be found on http://www.grid.auth.gr/pki/seegrid-ca/ra/, and for instructions on requesting and using the certificates refer to page http://www.grid.auth.gr/pki/seegrid-ca/documents/.

In order to register to SEE-GRID VO, visit page https://voms.irb.hr:8443/voms/seegrid/.

For becoming a member of Meteo VO (meteo.see-grid-sci.eu), register at https://voms.grid.auth.gr:8443/voms/meteo.see-grid-sci.eu.

Future Environment VO (env.see-grid-sci.eu) should register at https://voms.ipp.acad.bg:8443/voms/env.see-grid-sci.eu, while members of Seismological community can manage their seismo.see-grid-sci.eu VO membership at https://voms.ulakbim.gov.tr:8443/voms/seismo.see-grid-sci.eu

If you don't know which VO is right for you, you can see full list of registered VOs here

You're also welcome to visit our Users Wiki Page

Importing user certificate on UI

Here, it is assumed that you already have access to configured UI. If you don't have it, ask your system administrator to create an account for you. If this is not possible, you can install and configure UI on your local machine, by following the instructions given here.

After obtaining your certificate, place it in your $HOME/.globus directory on the User Interface that you are using together with corresponding user key (if this directory doesn't exist, create it). Name of the user certificate file should be usercert.pem and key file should be named userkey.pem. If, for any reasons, you don't want to use default path and names, you should set environment variables $X509_USER_CERT and $X509_USER_KEY to point to your certificate and user file. For security reasons it is necessary to set the permissions of userkey.pem to 400, i.e. to be readable only by the owner. In other cases many grid operations would fail.

More informations on environment variables regarding authorization and authentication can be found here.

Generating voms-proxy

Before using any of the glite-UI commands it is necessary to have a valid proxy credential available on the glite-UI machine. You can create it using the voms-proxy-init command. Make sure that you have your certificate/key pair in directory $HOME/.globus:

-rw-r--r--    1 neda     neda         5015 Jun  7 13:10 usercert.pem
-r--------    1 neda     neda          963 Jun  7 13:10 userkey.pem

Note that file permissions are important.

You can now issue the VOMS client command. You will be prompted for the pass-phrase.

[neda@ce neda]$ voms-proxy-init -voms aegis
Enter GRID pass phrase:
Your identity: /C=RS/O=AEGIS/OU=Institute of Physics Belgrade/CN=Neda Svraka
Cannot find file or dir: /home/neda/.glite/vomses
Creating temporary proxy ................................ Done
Contacting  se.phy.bg.ac.yu:15001 [/DC=ORG/DC=SEE-GRID/O=Hosts/O=Institute of Physics Belgrade/CN=host/se.phy.bg.ac.yu] "aegis" Done
Creating proxy ...................................................................... Done
Your proxy is valid until Thu Jul 26 01:36:52 2007

Proxy created in this way contains attributes retrieved from a VOMS server. Information of existing a valid proxy-certificate can be obtained using command voms-proxy-info. Two useful options are -all, which prints everything, and -fqan, which prints the groups and roles in FQAN format. For example simple usage of command looks like this:

[neda@ce neda]$ voms-proxy-info
subject   : /C=RS/O=AEGIS/OU=Institute of Physics Belgrade/CN=Neda Svraka/CN=proxy
issuer    : /C=RS/O=AEGIS/OU=Institute of Physics Belgrade/CN=Neda Svraka
identity  : /C=RS/O=AEGIS/OU=Institute of Physics Belgrade/CN=Neda Svraka
type      : proxy
strength  : 512 bits
path      : /tmp/x509up_u35003
timeleft  : 11:56:26

By default validity of such certificate is set to 12 hours. Longer validity must be explicitly requested by user when generating proxy, but is limited to max. 24 hours due to security reason. If job will take more time to complete (waiting time included!), then MyProxy must be used so that the proxy can be renewed by the WMS.

Myproxy

By default voms-proxy is valid for 12 hours. All jobs that are not finished at the time when voms-proxy expire, will be terminated. There are two ways to avoid this:

  • set longer life time of proxy with option -hours <time>. This is not recommended due to security issues.
  • by using proxy renewing method by creating myproxy with command
myproxy-init -d -n

By doing this, you are creating long-term proxy on dedicated server specified in $MYPROXY_SERVER. You can use different server by specifying it with -s option

myproxy-init -d -n -s <myproxy_server>

Option -n is used to avoid use of passphrase to access long-term proxy, so that WMS could perform renewal automatically, and with -d option user DN is associated to the proxy.

Default lifetime of myproxy is one week and proxies created from it lasts 12 hours. This times can be changed by using the -c ant the -t options respectively.

The mechanism of proxy renewal works like this: When you submit a job, you delegate a short-lived proxy to WMS. About half an hour before the delegated proxy expires, the WMS will search for the stored credentials either on its default MyProxy server, or on the host that you specified in your job description file (JDL) with MyProxyServer attribute. If you have stored long-lived proxy on the myproxy server in question, WMS will retrieve new short-lived proxy that will be used for the next 12h. If your job doesn't finish by that time, the whole process will be repeated. If, on the other hand, WMS doesn't find your credentials on MyProxy server by the time your initial proxy expires, your job will be ruthlessly killed. If your job is lasting longer then the credentials you stored on MyProxy, or your job got stuck in a really long queue, all you need to do is another myproxy-init before your credential expires, and the things will just work.

Myproxy can be viewed or destroyed from myproxy server with following commands:

myproxy-info -d -s <myproxy_server>
myproxy-destroy -d -s <myproxy_server>

For more details on usage of myproxy-init, take a look at Delegation of Credentials Using MyProxy

Note: If your job is not finished before myproxy expires, you just need to recreate myproxy with myproxy-init command, while if you're just using long-term voms-proxy, prolonging it would not be possible.

Note: MyProxy server that you are using must have your WMS in the list of authorized hosts.

Note: In order to manipulate your jobs or files you must have valid voms-proxy on your UI. Proxy renewal method with myproxy is used only by grid services (such as WMS), and when voms-proxy on your UI expires, you need to create new one manually in order to have access to grid resources.

Job Management

Now you can use glite-UI commands.

Job Description Language (JDL) file

The key element of search for resources that are match job requests is JDL (Job Description Language) file. It contains attributes for the description of a request, that are taken into account by the Workload Management System (WMS) components in order to schedule and submit a job or the jobs of a complex request. Simple JDL file, hello.jdl, is used here. Executable is hostname command which returns the name of working node (WN) on which job is executed. Result is published in generated output file message.txt and possible errors are printed in generated file stderror. Attributes StdOutput and StdError define to which files standard output and standard error of your job will be redirected. The attribute OutputSandbox defines which files should be returned to your UI when the jobs is completed and its results retrieved. Another possible attribute is InputSandbox which can define the files that should be transferred from the UI to the WN, and which are necessary for the execution of the job. For example, if the executable is not a command of the operating system (usually a custom script is defined), then such a script and all accompanying files necessary for its execution must be specified in InputSandbox attribute.

[
Type = "Job";
Executable = "/bin/hostname";
Arguments = "";
StdOutput = "message.txt";
StdError = "stderror";
OutputSandbox = {"message.txt","stderror"};
]

If you are not sure weather your WMS is using the same MyProxy server you used for delegating proxy, or if you're sure that it's using a different one, you can specify the hostname of your MyProxy server in a JDL file:

MyProxyServer = "myproxy.ipb.ac.rs";


Sequence of steps that you have to perform in order to submit job and monitor its execution via WMProxy is pretty similar to one shown when Network Server was used. Set of commands is extended with new ones that should support new features invoked with WMProxy service. These new things will be emphasized here.

Delegating your credentials to WMS

Each job submitted to WMProxy has to be associated with a credential previously delegated by user who submitted the job. This credential is used to perform job related operations when interaction with other services is requested. Delegation can be performed automatically or explicitly. In the first case you ask automatic delegation during submission (glite-wms-job-submit with option -a) or during listing adequate CEs (glite-wms-job-list-match with option -a). This is not recommended because in this way you have to delegate a proxy for every job you want to submit and this operation is time consuming. It is better to delegate a proxy explicitly (glite-wms-job-delegate-proxy), which can be used for multiple job submissions.

[neda@ce neda]$ glite-wms-job-delegate-proxy -d dID
 Connecting to the service https://wms.phy.bg.ac.yu:7443/glite_wms_wmproxy_server
 ================== glite-wms-job-delegate-proxy Success ==================
 Your proxy has been successfully delegated to the WMProxy:
 https://wms.phy.bg.ac.yu:7443/glite_wms_wmproxy_server
 with the delegation identifier: dID
 ==========================================================================

Later you should use delegation identifier (here: dID) with commands glite-wms-job-list-match, whose output is list CEs that match job requirements, and glite-wms-job-submit, used for submission of a job. In both case you use option -d with delegation identifier.

Listing resources available to your job

Before actually submitting a job, it is often useful to see a list of resources which are suitable to run a specific job (fulfill requirements specified in a JDL file). This can also be used as a check of syntactical correctness of JDL file, and as a check if the job can actually be executed on a given infrastructure. A list of resources satisfying your JDL requirements can be obtained using the glite-job-list-match command, with JDL job description file used as input argument. Resources are listed according to their corresponding ranks. First in the list is one with highest rank.

[neda@ce neda]$ glite-wms-job-list-match -d dID hello.jdl
 Connecting to the service https://wms.phy.bg.ac.yu:7443/glite_wms_wmproxy_server
 ==========================================================================
 COMPUTING ELEMENT IDs LIST
 The following CE(s) matching your job requirements have been found:
 *CEId*
 - ce.phy.bg.ac.yu:2119/jobmanager-pbs-aegis
 - cluster1.csk.kg.ac.yu:2119/jobmanager-pbs-aegis
 - g02.phy.bg.ac.yu:2119/blah-pbs-aegis
 - grid01.elfak.ni.ac.yu:2119/blah-pbs-aegis
 - grid01.rcub.bg.ac.yu:2119/jobmanager-pbs-aegis
 - gw01.rogrid.pub.ro:2119/jobmanager-lcgpbs-aegis
 - rti29.etf.bg.ac.yu:2119/jobmanager-pbs-aegis
 - seegrid2.fie.upt.al:2119/jobmanager-pbs-aegis
 - grid01.elfak.ni.ac.yu:2119/jobmanager-pbs-aegis
 ===========================================================================

Submitting a job

Now, if you are satisfied with the list of available Computing Elements (CEs), you can submit your job by using command glite-wms-job-submit ) which requires a JDL file as input:

[neda@ce neda]$ glite-wms-job-submit -d dID -o jid hello.jdl
 Connecting to the service https://wms.phy.bg.ac.yu:7443/glite_wms_wmproxy_server
 ====================== glite-wms-job-submit Success ======================
 The job has been successfully submitted to the WMProxy
 Your job identifier is:
 https://wms.phy.bg.ac.yu:9000/w-sbZQIvNl1tQLVVwgqW_Q
 The job identifier has been saved in the following file:
 /home/neda/jid
 ==========================================================================

If command was successfully executed it returns unique job identifier which is assigned to job by WMS:

https://wms.phy.bg.ac.yu:9000/w-sbZQIvNl1tQLVVwgqW_Q

Job identifier (JID) is used as input argument for commands related to monitoring of job execution and collecting job's results. JID consists of two parts. The first part is endpoint URL of the LB server which holds logging and bookkeeping information of job ('''https://wms.phy.bg.ac.yu:9000''') and second one, which is generated by WMS-UI (w-sbZQIvNl1tQLVVwgqW_Q), uniquely defining the job in grid environment. In the above example, -o option was used which enables that JID to be stored in desired file, e.g. file jid in directory /home/neda.

Checking the job status

You can check the state of your job by using glite-wms-job-status command. Option -i makes possible monitoring more than one job by choosing one of the offered options. This is the case when the JIDs are stored in the same file (file jid in /home/neda).

[neda@ce neda]$ glite-wms-job-status -i jid
 ------------------------------------------------------------------
 1 : https://wms.phy.bg.ac.yu:9000/TOPdK-V6gh28DP8gHPpR5A
 2 : https://wms.phy.bg.ac.yu:9000/w-sbZQIvNl1tQLVVwgqW_Q
 a : all
 q : quit
 ------------------------------------------------------------------
 Choose one or more jobId(s) in the list - [1-2]all:2
 *************************************************************
 BOOKKEEPING INFORMATION:
 Status info for the Job : https://wms.phy.bg.ac.yu:9000/w-sbZQIvNl1tQLVVwgqW_Q
 Current Status: Done (Success)
 Exit code: 0
 Status Reason: Job terminated successfully
 Destination: rti29.etf.bg.ac.yu:2119/jobmanager-pbs-aegis
 Submitted: Wed Dec 20 17:35:47 2006 CET
 *************************************************************

States that represent the normal job live span are: Submitted, Waiting, Ready, Scheduled, Running and Done. Aborted state represents irregularity in jobs processing.

Retrieving the output of the job

After the job is successfully executed (state Done) you can retrieve the output of your job to the UI machine (i.e. files specified in the OutputSandbox attribute in the JDL file) by passing JID to glite-wms-job-output command. Option -dir is used which allows you to store output files at desired location (here: /home/neda/izlaz).

[neda@ce neda]$ glite-wms-job-output --dir /home/neda/izlaz -i jid
 ------------------------------------------------------------------
 1 : https://wms.phy.bg.ac.yu:9000/TOPdK-V6gh28DP8gHPpR5A
 2 : https://wms.phy.bg.ac.yu:9000/w-sbZQIvNl1tQLVVwgqW_Q
 a : all
 q : quit
 ------------------------------------------------------------------
 Choose one or more jobId(s) in the list - [1-2]all (use , as separator or - for a range): 2
 Connecting to the service https://147.91.84.25:7443/glite_wms_wmproxy_server
 ===========================================================================
       JOB GET OUTPUT OUTCOME
 Output sandbox files for the job:
 https://wms.phy.bg.ac.yu:9000/w-sbZQIvNl1tQLVVwgqW_Q
 have been successfully retrieved and stored in the directory:
 /home/neda/izlaz
 ===========================================================================

You can see that output files are in desired directory:

[neda@ce neda]$ ll izlaz
total 4
-rw-rw-r-- 1 neda neda 19 Dec 20 18:04 message.txt
-rw-rw-r-- 1 neda neda 0 Dec 20 18:04 stderror

Troubleshooting failed jobs

You can get more information about events related to job by using glite-wms-job-logging-info with JID as input argument. This logging information (usually asked in a more verbose version, using the option -v 3 or -v 2) can serve as a starting point when there are problems with the execution of a job. The logging information is also a necessary input when you submit a ticket in the SEE-GRID Helpdesk, asking for support for any problems you may experience with your job execution. Also you can cancel the job with glite-wms-job-cancel command with JID as input argument.

Submitting collection of jobs

It is worth to mention that submission of complex job can be done in the same way as the submission of simple job when the description of job and its requirements is put in one JDL file, which is the case almost always. Exception can be made with collections of independent jobs, when you can give a path to directory where the single job's JDLs are. For accomplishing this you should use option --collection with glite-wms-job-submit command. Note: you can not look for list of matching CEs for complex jobs. In the example shown below there your see submission of collection of five independent jobs located in /home/neda/jobs directory:

[neda@ce neda]$ ll /home/neda/jobs
total 20
-rw-r--r--    1 neda     neda          154 Jul  5 13:02 hello0.jdl
-rw-r--r--    1 neda     neda          154 Jul  5 13:02 hello1.jdl
-rw-r--r--    1 neda     neda          154 Jul  5 13:02 hello2.jdl
-rw-r--r--    1 neda     neda          154 Jul  5 13:03 hello3.jdl
-rw-r--r--    1 neda     neda          154 Jul  5 13:03 hello4.jdl


[neda@ce neda]$ glite-wms-job-submit -d dID -o jobsId --collection /home/neda/jobs 
 Connecting to the service https://wms.phy.bg.ac.yu:7443/glite_wms_wmproxy_server
 ====================== glite-wms-job-submit Success ======================
 The job has been successfully submitted to the WMProxy
 Your job identifier is:
 https://wms.phy.bg.ac.yu:9000/80tjbwwnCBFZYFJQXJvIoA
 The job identifier has been saved in the following file:
 /home/neda/jobsId
 ==========================================================================

Procedure for monitoring of job status is same as for simple jobs, usage of glite-wms-job-status command with JID. Here JID can be job identifier of whole collection or one of independent jobs. As you can see below every job in collection has its own job identifier. Status of collection:

[neda@ce neda]$ glite-wms-job-status -i jobsId
 *************************************************************
 BOOKKEEPING INFORMATION:
 Status info for the Job : https://wms.phy.bg.ac.yu:9000/80tjbwwnCBFZYFJQXJvIoA
 Current Status:     Running
 Status Reason:      unavailable
 Destination:        dagman
 Submitted:          Thu Jul  5 13:16:14 2007 CEST
 *************************************************************
 - Nodes information:
     Status info for the Job : https://wms.phy.bg.ac.yu:9000/wJ134KOfOXLTjUS45sQTrQ
     Node Name:          hello4_jdl
     Current Status:     Scheduled
     Status Reason:      Job successfully submitted to Globus
     Destination:        cluster1.csk.kg.ac.yu:2119/jobmanager-pbs-aegis
     Submitted:          Thu Jul  5 13:16:14 2007 CEST
 *************************************************************
     Status info for the Job : https://wms.phy.bg.ac.yu:9000/oeJpPuRjhylbnQpPt_30XQ
     Node Name:          hello2_jdl
     Current Status:     Scheduled
     Status Reason:      Job successfully submitted to Globus
     Destination:        cluster1.csk.kg.ac.yu:2119/jobmanager-pbs-aegis
     Submitted:          Thu Jul  5 13:16:14 2007 CEST
 *************************************************************
     Status info for the Job : https://wms.phy.bg.ac.yu:9000/KaiWJRW_icZxQI8tUm7vWQ
     Node Name:          hello0_jdl
     Current Status:     Scheduled
     Status Reason:      Job successfully submitted to Globus
     Destination:        cluster1.csk.kg.ac.yu:2119/jobmanager-pbs-aegis
     Submitted:          Thu Jul  5 13:16:14 2007 CEST
 *************************************************************
     Status info for the Job : https://wms.phy.bg.ac.yu:9000/f9o8iVqGMQpzJEQpLnhqEA
     Node Name:          hello3_jdl
     Current Status:     Scheduled
     Status Reason:      Job successfully submitted to Globus
     Destination:        cluster1.csk.kg.ac.yu:2119/jobmanager-pbs-aegis
     Submitted:          Thu Jul  5 13:16:14 2007 CEST
 *************************************************************
     Status info for the Job : https://wms.phy.bg.ac.yu:9000/LufprwQIcB7NAZEV9JrcCw
     Node Name:          hello1_jdl
     Current Status:     Scheduled
     Status Reason:      Job successfully submitted to Globus
     Destination:        cluster1.csk.kg.ac.yu:2119/jobmanager-pbs-aegis
     Submitted:          Thu Jul  5 13:16:14 2007 CEST
 *************************************************************

or status of one node of collection:

[neda@ce neda]$ glite-wms-job-status https://wms.phy.bg.ac.yu:9000/LufprwQIcB7NAZEV9JrcCw
 *************************************************************
 BOOKKEEPING INFORMATION:
 Status info for the Job : https://wms.phy.bg.ac.yu:9000/LufprwQIcB7NAZEV9JrcCw
 Current Status:     Scheduled
 Status Reason:      Job successfully submitted to Globus
 Destination:        cluster1.csk.kg.ac.yu:2119/jobmanager-pbs-aegis
 Submitted:          Thu Jul  5 13:16:14 2007 CEST
 Parent Job:         https://wms.phy.bg.ac.yu:9000/80tjbwwnCBFZYFJQXJvIoA
 *************************************************************


More information about events related to job you can get with glite-wms-job-logging-info command:

[neda@ce neda]$ glite-wms-job-logging-info -i jobsId

where jobsId is file where the job identifier of collection is stored.

You can retrieve results of collection of jobs by using glite-wms-job-output command with global job identifier:

[neda@ce neda]$ glite-wms-job-output -i jobsId

Of course job has to be in state Done. Results of single jobs in collection will be stored in separate subdirectories.

Advanced job types and operations

Sometimes you might need to do more then just submitting single job and waiting for it to be done in order to get your results. You may want to submit many similar jobs with just altering parameters, or you want to submit several different jobs at once which can be independent or even dependent on other jobs output. You may want to watch your output files while the job is being executed, or use files on remote storage elements for input and output of your jobs. All this, and more is available to you by using WMS at advanced level. Here are some quick user guides on how to do the following

Referencies

NOTE: More information on this topics you can find in following documents:

- F. Pacini, Job Description Language (JDL) Attributes Specification

- gLite User Guide

- WMS Guide

- EGEE User's Guide, WMProxy Service

Personal tools