RRS (Reverse Remote Shell) usage in job series

From EGEE-see WIki

Jump to: navigation, search

The page is part of SEE-GRID Gridification Guide
This topic is contributed by Center for Scientific Research of SASA and University of Kragujevac,Serbia

Introduction

The technique described in Interactive jobs using RRS turned to be very useful in many cases related to pilot jobs and job pooling techniques, either if something goes wrong or simply for monitoring purposes during software development phase. It is potentially very useful when developer owns a possibility to achieve an interactive shell connection to a remote WN where things actually happen.

This topic describes a variation of the technique from Interactive jobs using RRS applied to a problem when one deals with a bunch of jobs and direct shell interaction is needed as a capability from all of them. One of major strengths of RRS method regarding job pools is that the service can be binded to any free TCP port. This means that each job will start daemon RRS service acting in background and trying to connect to predefined host every few seconds on its own port. When one wants to make a shell connection to a specific job, it is sufficient to know the port number where the other RRS side tries to connect to. The proposed RRS implementation supports all mentioned capabilities very well.

Implementation

As mentioned above, the intention is to assign each job RRS service its own port. Any port numbering scheme can be used (perhaps security issues have to be discussed at some instance), but in this example the simplest one has been employed:

$PORTNUMBER=$PORTBASE+$PORTOFFSET

while $PORTOFFSET is simply a job serial number in the pool (0, 1, 2...).

The application gridified using (among others) this RRS technique is Lizza-PAKP, environmental software package for underground waters related simulations, developed at Center for Scientific Research of SASA and University of Kragujevac. The software applies this approach within the framework of TCP binder developed as a part of VIVE (Volumetric Image Visualization Environment) application. In this specific implementation, bash script executed on WN contains the following sequence which fulfils the above aim:

# $PORTOFFSET is zero based!
PORTBASE=2001
PORTOFFSET=$3
RRSHOST=cluster1.csk.kg.ac.yu

# RRS port will be set for each process separately regarding its serial number taken as $3
PORT=`echo "$PORTBASE+$PORTOFFSET" | bc`
echo "Using RRS port: $PORT"

# Reverse Remote Shell daemon start
chmod a+x rrs
./rrs $RRSHOST $PORT -D -R5

# Invoke TCP Binder server etc.........

In rrs invocation line, -D means to start rrs as daemon, while -R5 forces connection to $RRSHOST every 5 seconds. Running daemon in grid job seems to be fully safe because batching system takes care of its disposal in the moment of job termination (either user cancel or credential expiration).

Establishing a connection to a single pool/pilot job

On the other side, things are even simpler. If one wants a shell connection to, let's say 14th job in the pool ($PORTBASE is 2001), the following is sufficient:

[milos@cluster1 binder]$ ./rrs -l -p 2014
[i] using plain-text communication
[+] listening for incoming connection on port 2014, no timeout
[i] got connection from 147.91.208.60:52083
[aegis021@cluster11 https_3a_2f_2fwms.phy.bg.ac.yu_3a9000_2f9L9XM18aOiNmDmzc-GXevw]$ pwd
/scratch/https_3a_2f_2fwms.phy.bg.ac.yu_3a9000_2f9L9XM18aOiNmDmzc-GXevw

In less than 5 seconds, the connection establishes to a remote WN.

Personal tools