Software Installation Management for PROPEL Application

From EGEE-see WIki

Jump to: navigation, search

This Wiki page is a part of SEE-GRID Gridification Guide. It is contributed by Belgrade University Computer Centre.

Contents

Introduction

Here you can find instructions on how to install PROPEL application in the grid environment. PROPEL (Asteroid Proper Elements) is an application designed to calculate proper elements for asteroids in the Asteroid belt of the Solar system. It is developed by the Astronomical Observatory of Belgrade, Serbia. Gridification of the application, which includes the scripts in this article, was done in collaboration between Astronomical Observatory of Belgrade and Belgrade University Computer Centre.

This document can be used as a tutorial on how to install an arbitrary application to the grid. All the scripts used in this document have been written as generic as possible in order to make modifications for other applications simple. Installation procedures are made using the ESM (Experimental Software Manager) user class. Details on the instructions of usage, benefits of using the ESM user class and general problems and issues regarding software management can be found in Software Installation Management Guide.

Proxy Certificate Creation

First thing that you need to do is create proxy certificate by logging in as an ESM user. Example for SEEGRID VO:

$voms-proxy-init -voms seegrid:/seegrid/Role=sgmadmin

Software Management

General Notes

Scripts take one optional argument, name of the target CE. If not specified, WMS (Workload Management System) will try to find CE that meets the criteria. Using scripts without specifying target CE is not recommended and should be avoided. One of the reasons is because IS (Information Service) usually takes a few minutes to update tags for installed software. Also, you will almost always wish to specify the exact CE where removal will be performed. To check status of software tags on the IS use lcg-infosites utility. Example for SEEGRID VO:

$lcg-infosites --vo seegrid tag

This will list CE and tags for installed experimental software.

To list all available CEs use the lcg-infosites utility again. Example for SEEGRID VO:

$lcg-infosites --vo seegrid CE

CEs listed are the actual arguments needed for scripts (target CEs). All scripts automatically generate JDL files that will handle the software management procedures.

Installation

The script install-propel creates temporary JDL file and submittes it to the grid:

#!/bin/bash

#args: CE

CE=$1
VER='1.1'
APP='propel'
VO='seegrid'
NOTIFY_EMAIL="milan (d) potocnik (a) rcub (d) bg (d) ac (d) yu

echo $APP-$VER install Startup $*

QUERY="other.GlueCEUniqueID == \"`echo $CE`\" &&"

if [ -z $CE ]; then
  echo "Computing element not specified. Trying without it..."
  QUERY=""
fi

cat > site_$APP_install_tmp.jdl <<EOF
 Executable = "/opt/lcg/bin/lcg-ManageSoftware";
 InputSandbox = {"install_sw"};
 OutputSandbox={"out", "err"};
 StdOutput="out";
 StdError="err";
 Requirements = $QUERY !Member("VO-$VO-$APP-$VER", other.GlueHostApplicationSoftwareRunTimeEnvironment) \
   && !Member("VO-$VO-$APP-$VER-to-be-validated", other.GlueHostApplicationSoftwareRunTimeEnvironment) \
   && !Member("VO-$VO-$APP-$VER-processing-validate", other.GlueHostApplicationSoftwareRunTimeEnvironment) \
   && !Member("VO-$VO-$APP-$VER-processing-install", other.GlueHostApplicationSoftwareRunTimeEnvironment) \
   && !Member("VO-$VO-$APP-$VER-aborted-validate", other.GlueHostApplicationSoftwareRunTimeEnvironment) \
   && !Member("VO-$VO-$APP-$VER-aborted-install", other.GlueHostApplicationSoftwareRunTimeEnvironment);
 Arguments = "--install --vo $VO --tag $APP-$VER --notify $NOTIFY_EMAIL  --install_script install_sw";
 VirtualOrganisation="$VO";
EOF

edg-job-list-match site_$APP_install_tmp.jdl
edg-job-submit -o jids/jid site_$APP_install_tmp.jdl

#echo "Debug mode, listing created jdl file..."
#cat site_$APP_install_tmp.jdl
echo "Deleting created jdl file..."
rm -f site_$APP_install_tmp.jdl
echo "Installation request sent."

The script install-sw performs the actual installation on the target CE and is called by lcg-ManageSoftware:

#!/bin/bash
#  required
#  voms-proxy-init -voms seegrid:/seegrid/Role=sgmadmin
#  check tags with
#  lcg-infosites --vo seegrid tag

VER='1.1'
VO='seegrid'
VODIR=$VO_SEEGRID_SW_DIR
APP='propel'
INSTALL_URL="http://galeb.etf.bg.ac.yu/~potocnik/grid/install/$APP-$VER/$APP-$VER.tar.gz"

echo "############ starting install_sw $APP-$VER ############"
echo "Running on: $HOSTNAME(`hostname -i`) as `whoami`"
echo Belonging to CE: `$EDG_LOCATION/bin/edg-brokerinfo getCE`

if [ -z $VODIR ]; then #failure?
   echo "VO_`echo $VO|tr [a-z] [A-Z]`_SW_DIR is not set in the environment"
   echo "############ ending install_sw $VER returning 1 ############"
   exit 1
fi

if [ $VODIR = '.' ]; then #failure?
   echo "The site does not provide a shared file system: VO_`echo $VO|tr [a-z] [A-Z]`_SW_DIR=$VODIR"
   echo "############ ending install_sw $VER returning 1 ############"
   exit 1
fi

export TAR_LOC=`pwd`	# this is the temporary directory where is
			# supposed to be the steering script and the tarballs

wget $INSTALL_URL	# fetches tarball from the web requiring outbound connectivity,
			# outbound connectivity will be required for this software
# anyway;
			# this way installation files are not needed in the inputsanbox,
			# another option is to include tarballs in the input sandbox in
			# the environments where there is no outbound connectivity

cd $VODIR 		# go to software installation root directory
pwd
ls -al			# and check its content

# remove previously created junk caused by uninstall
# this is related to undocumented lcg-ManageSoftware behavior
rm -r -f $VO-$APP-1.1*_tmp
rm -f $VO.*

rm -r -f -v $APP-$VER # remove only the current version of the software

# for initial development and testing only:
# remove previously created installation attempts
# this also destroys valid old software versions which we may want to preserve
rm -r -f $APP-_tmp
ls -al                # and check its content

mkdir $APP-$VER       # create installation directory
if [ ! $? = 0 ]; then # failure?
   echo "Directory $APP-$VER already exists in" `pwd`
   echo "############ ending install_sw $VER returning 1 ############"
   exit 1
fi
chmod 775 $APP-$VER	# allow all users in sgmadmin group full access to the dir,
			# should be the default behavior

cd $APP-$VER
echo "Installing into" `pwd`

echo "running the command: tar xzvf $TAR_LOC/$APP-$VER.tar.gz"
tar xzvf $TAR_LOC/$APP-$VER.tar.gz
if [ ! $? = 0 ]; then #failure?
   echo "Failure unpacking $TAR_LOC/$APP-$VER.tar.gz"
   echo "############ ending install_sw $APP-$VER returning 1 ############"
   exit 1
fi

echo "Installation into " `pwd` " finished"
ls -al                #list what has been installed
echo "############ ending install_sw $APP-$VER returning $? ############"
exit $? # This is the relevant return code

Validation

The script validate-propel creates temporary JDL file and sends it to the grid:

#!/bin/bash

#args: CE

CE=$1
VER='1.1'
APP='propel'
VO='seegrid'
NOTIFY_EMAIL="milan (d) potocnik (a) rcub (d) bg (d) ac (d) yu

echo $APP-$VER validate Startup $*

QUERY="other.GlueCEUniqueID == \"`echo $CE`\" &&"

if [ -z $CE ]; then
  echo "Computing element not specified. Trying without it..."
  QUERY=""
fi

cat > site_$APP_validate_tmp.jdl <<EOF
 Executable = "/opt/lcg/bin/lcg-ManageSoftware";
 InputSandbox = {"validate_sw"};
 OutputSandbox = {"out", "err"};
 StdOutput="out";
 StdError="err";
 Requirements = $QUERY (Member("VO-$VO-$APP-$VER-to-be-validated", other.GlueHostApplicationSoftwareRunTimeEnvironment));
 Arguments = "--validate --vo $VO --tag $APP-$VER --notify $NOTIFY_EMAIL --validate_script validate_sw";
 VirtualOrganisation="$VO";
EOF

edg-job-list-match  site_$APP_validate_tmp.jdl
edg-job-submit -o jids/jid site_$APP_validate_tmp.jdl

#echo "Debug mode, listing created jdl file..."
#cat site_$APP_install_tmp.jdl
echo "Deleting created jdl file..."
rm -f site_$APP_validate_tmp.jdl
echo "Validation request sent."

The script validate-sw performs the actual validation on the target CE and is called by lcg-ManageSoftware:

#!/bin/bash
# see the comments at the beginning of install_sw

VER='1.1'
APP='propel'
VO='seegrid'
VODIR=$VO_SEEGRID_SW_DIR
SE=$VO_SEEGRID_DEFAULT_SE
INSTALL_HOST="galeb.etf.bg.ac.yu"

#set up LFC/GFAL environment
export LFC_HOST=grid02.rcub.bg.ac.yu
export LCG_CATALOG_TYPE=lfc
export LCG_GFAL_VO=seegrid
export CE=`$EDG_LOCATION/bin/edg-brokerinfo getCE`

echo "############ starting validate_sw $APP-$VER ############"
echo "# VO_`echo $VO|tr [a-z] [A-Z]`_SW_DIR (e.g. /opt/exp_soft/$VO) should be writable only by ${VO}sgm user"
echo "# and only ${VO}sgm should be able to manage published application tags"
echo "Running job $EDG_WL_JOBID on: $HOSTNAME(`hostname -i`) as `whoami`"
echo "VO_`echo $VO|tr [a-z] [A-Z]`_SW_DIR is $VODIR"
echo Belonging to CE: $CE
echo Close SE is $SE

if [ -z $VODIR ]; then #failure?
   echo "VO_`echo $VO|tr [a-z] [A-Z]`_SW_DIR is not set in the environment"
   echo "############ ending uninstall_sw $APP-$VER returning 1 ############"
   exit 1
fi

if [ $VODIR = '.' ]; then #failure?
   echo "The site does not provide a shared file system: VO_`echo $VO|tr [a-z] [A-Z]`_SW_DIR=$VODIR"
   echo "############ ending uninstall_sw $APP-$VER returning 1 ############"
   exit 1
fi

cd $VODIR # software installation root directory

echo "### Listing `pwd`"
ls -al
cd $APP-$VER
if [ ! $? = 0 ]; then #failure?
   echo "Directory $APP-$VER does not exists in" `pwd`
   echo "############ ending validate_sw $APP-$VER returning $? ############"
   exit 1
fi

ls -al

# any additional testing of installation can be done here, checking for connectivity
# to necessary resources, running application with some test data, etc.
echo "### checking possible routing problem"
traceroute $INSTALL_HOST

echo "Installation validated."
echo "############ ending validate_sw $APP-$VER returning 0 ############"
exit 0

Removal

The script uninstall-propel creates temporary JDL file and sends it to the grid:

#!/bin/bash

#args: CE

CE=$1
VER='1.1'
APP='propel'
VO='seegrid'
echo $APP-$VER uninstall Startup $*

QUERY="other.GlueCEUniqueID == \"`echo $CE`\" &&"

if [ -z $CE ]; then
  echo "Computing element not specified. Trying without it..."
  QUERY=""
fi

cat > site_$APP_uninstall_tmp.jdl <<EOF
 Executable = "/opt/lcg/bin/lcg-ManageSoftware";
 InputSandbox = {"uninstall_sw"};
 OutputSandbox = {"out", "err"};
 stdoutput = "out";
 stderror = "err";
 Requirements = $QUERY (Member("VO-$VO-$APP-$VER", other.GlueHostApplicationSoftwareRunTimeEnvironment));
 Arguments = "--uninstall --vo $VO --tag $APP-$VER --uninstall_script uninstall_sw";
 VirtualOrganisation="$VO";
EOF

edg-job-list-match site_$APP_uninstall_tmp.jdl
edg-job-submit -o jids/jid site_$APP_uninstall_tmp.jdl

#echo "Debug mode, listing created jdl file..."
#cat site_$APP_uninstall_tmp.jdl
echo "Deleting created jdl file..."
rm -f site_$APP_uninstall_tmp.jdl
echo "Uninstallation request sent."

The script unistall-sw performs the actual removal on the target CE and is called by lcg-ManageSoftware:

#!/bin/bash
# see the comments at the beginning of install_sw

VER='1.1'
APP='propel'
VO='seegrid'
VODIR=$VO_SEEGRID_SW_DIR

echo "############ starting uninstall_sw $APP-$VER ############"
echo "Running on: $HOSTNAME(`hostname -i`) as `whoami`"
echo Belonging to CE: `$EDG_LOCATION/bin/edg-brokerinfo getCE`

if [ -z $VODIR ]; then #failure?
   echo "VO_`echo $VO|tr [a-z] [A-Z]`_SW_DIR is not set in the environment"
   echo "############ ending uninstall_sw $APP-$VER returning 1 ############"
   exit 1
fi

if [ $VODIR = '.' ]; then #failure?
   echo "The site does not provide a shared file system: VO_`echo $VO|tr [a-z] [A-Z]`_SW_DIR=$VODIR"
   echo "############ ending uninstall_sw $APP-$VER returning 1 ############"
   exit 1
fi

cd $VODIR 	# go to software installation root directory
ls –al		# and check its content

rm -f -v -R $APP-$VER	# remove only the current version of the software
if [ ! $? = 0 ]; then	# failure?
   echo "Removal of $APP-$VER from" `pwd` "failed"
   echo "############ ending uninstall_sw $VER returning $? ############"
   exit 1
fi

#optional further steps

echo "Removal of $APP-$VER from" `pwd` "finished"
echo "############ ending uninstall_sw $APP-$VER returning $? ############"
exit 0 # This is the relevant return code

Running Installed Software

After an application was successfully installed and validated, you can run it as a regular user with the proxy certificate similar to the one in the following example:

$voms-proxy-init -voms seegrid

An example of specifying application and CE for the "Requirements" attribute in the JDL file for PROPEL:

Requirements = Member("VO-seegrid-propel-1.1", other.GlueHostApplicationSoftwareRunTimeEnvironment) && other.GlueCEUniqueID == "cluster1.csk.kg.ac.yu:2119/jobmanager-pbs-seegrid";

PROPEL application is designed with the requirement for an environment variable to be present at runtime ($PROPEL_LIB_DIR). An example of JDL file for PROPEL application:

Executable = "run-example";
StdOutput = "std.out";
StdError = "std.err";
InputSandbox = {"run-example", "input.tar.gz"};
OutputSandbox = {"std.out", "std.err", "vast.fil.gz","vpla.fil.gz"};
ShallowRetryCount = 10;
Requirements = Member("VO-seegrid-propel-1.1", other.GlueHostApplicationSoftwareRunTimeEnvironment) && other.GlueCEUniqueID == "cluster1.csk.kg.ac.yu:2119/jobmanager-pbs-seegrid";

The run script run-example:

#!/bin/bash

APP_NAME=propel
APP_VERSION=1.1

tar -xzf input.tar.gz # extract input files

# This is needed for application to see the required libraries
export PROPEL_LIB_DIR=$VO_SEEGRID_SW_DIR/$APP_NAME-$APP_VERSION/
# Execute application
$VO_SEEGRID_SW_DIR/$APP_NAME-$APP_VERSION/propel.x

# Compress output files
gzip vast.fil
gzip vpla.fil
Personal tools