Software Installation Management Guide
The Role of Experiment Software Manager (ESM)
ESM (Experiment Software Manager) is/are a special subset of experiment user(s) who have permission to write into experiment area. They should be able to add/remove software at any time without notifying the site manager. ESM installs software on a per site base; no root access is required to install experiment software.
The main requirements for ESM are:
- ESM should be able to verify installation in different steps;
- ESM can run different validation procedures at different moments;
- ESM must have possibility to publish installed versions of software as tags, so that jobs can be directed to according site(s);
- ESM should also verify that there is enough space to keep installed several versions of experiment software and remove the old ones;
- Experiment software can depend on other software; therefore it is up to ESM to install missing packages that experimental software requires; ESM analyzes the dependencies and can: package the software including all necessary subpackages or use modular packing to ensure that only missing subpackages are installed.
In order to become an ESM you should request /seegrid/sgmadmin role from SEEGRID VOMS administrator.
Drawbacks of Software Installation Management
- The removal of software has to be done at a time when the production manager is certain that the specific software version is no longer in use; thus the managing of software dependencies can be very hard;
- The lack of roles limits ESMâ€™s abilities. ESM should be able to dynamically change roles and become a normal user, able to submit normal user requests to Grid;
- ESM jobs have no special priority, so they have to compete with normal user jobs.
- There is no automatic mechanism to trigger software installation per request.
Although this section is mostly dedicated to the site administrators, it can also help ESMs in understanding some problems on a particular site.
Make sure that your site has valid SEEGRID VO certificates and uses SEEGRID VOMS with user certificates. Before proceeding, you should also read the following article relevant for the proper site configuration for remote file access:
Sites choose between providing a space in a shared file system for the VOs or having no shared file system available. The local variable VO_SEEGRID_SW_DIR contains information about file system. If it is set to ".", that means that no shared file system is available. In a case of a shared file system software uses the environment variable directly, but there is no guarantee that the path is the same on all nods.
It is agreed that all SEEGRID sites should set up a file system shared among all WNs of the same cluster. The jobs that want to use the installed software just need to use the VO_SEEGRID_SW_DIR environment variable in order to access it. Software has to be accessible on the WNs using POSIX calls.
To create a shared directory do the following, preferably on your storage element:
mkdir /opt/exp_soft/seegrid chown seegridsgm.seegrid /opt/exp_soft/seegrid chmod 755 /opt/exp_soft/seegrid
Add a line to /etc/exports:
After that, execute:
$ exportfs -rv
Add on all WNs a line to /etc/fstab:
<YourStorageElement>:/opt/exp_soft/seegrid /opt/exp_soft/seegrid nfs hard,intr,nodev,nosuid 0 0
Create the mount point for /opt/exp_soft/seegrid and mount the new filesystem:
mkdir /opt/exp_soft/seegrid chown seegridsgm.seegrid /opt/exp_soft/seegrid mount -av
Finally, set VO_SEEGRID_SW_DIR to /opt/exp_soft/seegrid in site-info.def and reconfigure all WNs. This will update etc/profile.d/lcgenv.csh and /etc/profile.d/lcgenv.sh.
Allowing ESM to Update Tags
Site administrator should also check and, if necessary, set permissions of seegrid.list on CE. Its initial content may be left empty. Proper permissions are set by default after site configuration, but invalid values were noticed on a few sites.
$ ls -ld /opt/edg/var/info/seegrid/ /opt/edg/var/info/seegrid/seegrid.list drwxr-xr-x 2 seegridsgm seegrid 4096 Apr 12 2005 /opt/edg/var/info/seegrid/ -rw-r--r-- 1 seegridsgm seegrid 24 Jan 13 13:55 /opt/edg/var/info/seegrid/seegrid.list
ESM is responsible for adding and removing of tags. ESM can do so at any time. It is up to site managers to ensure the backing up of files containing the tags, but there is no guarantied way to claim that correct state of tags has been saved on CE in case of a failure.
Other Noticed Problems
It was noticed that lcg-ManageSoftware is not installed by default on gLite-3.1/SL4.5 WNs. This situation can be easily detected by noticing the following:
$edg-job-status -v 2 <JOB_ID> ... Current Status: Aborted Status Reason: Failure while executing job wrapper $edg-job-get-logging-info -v 2 ... Event: Done ... - reason = /opt/lcg/bin/lcg-ManageSoftware not found or unreadable
The easiest solution for this is installation of lcg-ManageSoftware-2.0-6 and edg-brokerinfo_gcc3_2_2-2.1-5_sl3.i386 RPMs on all WNs. Since lcg-ManageSoftware is a wrapper that updates tags depending on the outcome of used installation script, it properly updates tags even in case of an error in the script, and its availability provides substantial comfort to software installation managers.
Without installation of these two packages, this problem can be compensated by manual publishing of tags by using lcg-ManageVOTag on a UI after human inspection of installation, validation, or removal job output. A better alternative this is to use lcg-ManageVOTag within job scripts, since this command should still be available on all WNs.
Software Installation Management
Before proceeding, you should read the following:
ESM performs software installation in following steps:
- Moves the packaged software to a SE on every site where the software will be installed, depending on VO_SEEGRID_SW_DIR, using wget command.
- Directs installation and certification job to the sites where the experiment software has been copied. The result of validation is a report upon which the ESM can undertake the future actions.
- Publish software tags for installed software in case of successful validation.
ESM should prepare JDL and shell scripts for installation, validation, and uninstallation. ESM should also prepare and publish installation tarball(s). He/she may choose to submit only a script with the job, especially if the installation tarball is large, and then use wget or lcg-cp. To do so, publish your tarball on the web or a SE using lcg-cr.
voms-proxy-init -voms seegrid:/seegrid/Role=sgmadmin
Manage your installation jobs to SEEGRID sites using edg-job-submit. Instead of using lcg-asis, we recommend dynamic creation of JDL by job submission scripts, which place proper
Requirements = other.GlueCEUniqueID == "<Target-CE>:2119/jobmanager-lcgpbs-seegrid"
in install JDL. Another option for target CE selection is using -r option of edg-job-submit. The usage of -r is problematic if "edg-brokerinfo getCE" is used in the install script, since the chosen CE will not be properly set in job's .brokerinfo file.
Also note that "--notify" option of lcg-ManageSoftware does not seem to work and that in validation "--validate_script validate_sw" is required.
To list available CEs that can execute installation jobs, their CPU numbers and queue states, use
$lcg-infosites --vo seegrid ce
If your application has specific RAM, OS or processor requirements, you may obtain additional details with
$lcg-infosites --vo seegrid ce -v 2
The state of these CEs regarding software installations can be seen using the following command:
$ lcg-infosites --vo seegrid tag ... Name of the CE: ce02.grid.acad.bg VO-seegrid-vive-0.4.2 VO-seegrid-vive-0.4.3 VO-seegrid-Gate-3-to-be-validated ...
The listed published software tags indicate application VO, application name and version, as well as installation status in format "VO-<name_of_VO>-<flag-provided-by-ESM>-<status-flavour>". <flag-provided-by-ESM> is the value of lcg-ManageSoftware "--tag" parameter, and should be in format <application-name>-<version-number>, where version number can also contain dot-separated subversion and release numbers. <status-flavour> is described in Experiments Software Installation:
The flavour is a string that informs the ESM about the status of a given Experiment Software management process. It can be one of these possible values:
- processing-install: the software identified by <flag-provided-by-ESM> is getting installed.
- processing-remove: the removal process of the software is running.
- processing-validate: the validation process of the software is running.
- aborted-install: the installation process got aborted.
- aborted-remove: the removal process got aborted.
- aborted-validate: the validation process got aborted.
- to-be-validated: the installation process succeeds to run and the flag is now ready to be validated.
There are still two other possible states:
- Standalone TAG (without added flavors): the validation process runs successfully on the site and the software can be used by all the users.
- TAG no longer published : the removal process runs successfully and the software has been fully removed from the site.
Using such flag publication mechanism is useful to avoid concurrent processes running simultaneously for the same Application Software.
lcg-ManageVOTag is also useful in clearing of published tags, especially during development of your installation scripts, but note that its "-tag" parameter is a complete software tag string:
lcg-ManageVOTag -host <Target-CE> -vo seegrid --remove -tag VO-seegrid-vive-0.2-aborted-validate
In the following articles you can find example SGM and run scripts for PROPEL and VIVE applications:
Before proceeding, it is recommended to read general instructions regarding some common problems related to the software management in the grid environment.
All scripts (installation, validation and removal) rely on lcg-ManageSoftware utility to be present on the target CE (Computing Element). Usage of lcg-ManageSoftware is highly recommended because it provides a good way to manage software on grid. All scripts automatically generate JDL files that will handle the software management procedures.
Current version of installation scripts requires a shared application software system. If this file system is not present or is disabled, scripts will fail. For SEEGRID VO, shared file system directory is found in the $VO_SEEGRID_SW_DIR environment variable. In order to install software, you need to provide tarballs with the pre compiled software, which is done for both PROPEL and VIVE applications. It is important to make sure that all the necessary libraries are present or statically linked in the pre compiled software. Another option is to provide source files for your software and then, during execution of your install script, compile the source code. In that case, make sure that the appropriate compilers are present on the target CE or you will need to include them in your installation tarballs as well.
To send your tarballs to the target CE you have three options:
- Include tarballs in the InputSanbox of your JDL file. This is not recommended because software installation files can be quite large and you might reach Sandbox limit, also it might affect performance.
- Provide Web link to your tarballs, which is how installation is implemented for both PROPEL and VIVE applications. This way your script will download the tarballs from Web. For this solution to work, the target CE needs to be able to connect to the Web location in order to download tarballs (requires outbound connectivity).
- Store tarballs on some SE (Storage Element) and provide SURL/LFN/GUID. This way installation script will download tarballs from SE using lcg-util commands (lcg-cp). Make sure that SE and LFC are visible from the target CE.
After installation, you need to validate the installation. If the installation was successful, the appropriate to-be-validated tag will be present on the IS for the target CE. During validation, you can perform many operations to check if the installation was truly successful. Some of the operations could be: checking to see if the directory structure and files are present, checking to see if outbound connectivity for needed resources (in case of applications that require some sort of outbound connectivity) is present, running the application for some test data and checking if results are what they should be, etc.
Running Installed Software
You should not run ordinary jobs on grid using the ESM user class, because this class will usually have priority over jobs submitted by regular users and might cause some performance issues on CEs. ESM class should be used only for software management purposes. In order to run installed applications on CEs that have it actually installed, you need to specify appropriate "Requirements" attribute in your JDL file.
To list available applications and available CEs use lcg-infosites utility as previously described.
Many applications will have more than one executable file and a number of libraries that will be needed during runtime. This might cause some problems executing them in the grid environment. In many cases, and in the case of PROPEL application, "Executable" attribute will not point to the actual executable of the application in the JDL file, but rather to some script that will handle the execution. This script will then be transferred via InputSandbox and executed on CE.
There are two basic ways to call an executable file in your run script:
- Go to the directory where software is installed and then run the executable. This should be avoided for a number of reasons. Some of them are:
- For regular users, directory where software is installed has read-only access and application wont be able to create temporary and output files
- Unless you remember sandbox directory in your script and pass it as an argument to the executable, you will lose access to it, this might complicate things if you have more than one executable and libraries
- Execute the application directly from the run script. Problems that will occur in this approach:
- Since the working directory will be set to the Sandbox, application will not be able to see its own libraries and other executables
The above-mentioned problem in the second approach can be solved in an easy way, by specifying an environment variable that will point to the install directory (or some other directory where required libraries can be found) in your run script.
Software Tags Management
Tag is a string that uniquely identifies experiment software and its version number. It is up to ESM to run validation test before publishing the softwareâ€™s availability on a site. The tag is added as a value of the GlueHostApplicationSoftwareRunTimeEnvironment attribute in the Information System (IS).
Tag management of published software is achieved by using lcg-ManageVOTag command. Some examples of this command include:
- List of tags:
lcg-ManageVOTag -host <Target-CE-Machine> -vo seegrid -list
This command returns a list of all tags published by seegrid in a list separated by commas.
- Adding a tag:
lcg-ManageVOTag -host <HOST> -vo seegrid -add -tag <TAG> [-tag <TAG>...]
In case of a successful operation the following message is printed out:
lcg-ManageVOTag: <TAG> [<TAG>...] submitted for addition by seegrid to GlueHostApplicationSoftwareRunTimeEnvironment
It is also important to note that ESM should be very careful when adding a tag and do it only if validation is successful. If the experiment software is not installed properly, but ESM adds a tag, user jobs will be directed to sites with invalid installation.
- Removal of a tag:
lcg-ManageVOTag -host <HOST> -vo seegrid -remove -tag <TAG> [-tag <TAG>...]
In case of a successful operation the following message is printed out:
lcg-ManageVOTag: <TAG> [<TAG>...] removed by seegrid to GlueHostApplicationSoftwareRunTimeEnvironment
It is also important to note that ESM should be very careful when removing a tag, because it does not automatically remove the installed software. ESM has to remove the software manually and the installed software can be removed only when no user is running it.