SL4 WN glite-3.1 64bit

From EGEE-see WIki

Jump to: navigation, search

AEGIS01-PHY-SCL notes on installing and configuring natively compiled glite-WN and TORQUE_client on SL4.5 (64 bit)

In order to verify installation and configuration process of glite-WN and TORQUE_client on SL4 64 bit, we at AEGIS01-PHY-SCL tested the installation and configuration, and it was successful. These notes provide instructions on what we have done. We were using Dual-Core AMD Opteron processor 8218, but it should work on Intel dual-core and quad-core CPUs as well - the only difference is the Java executable to be used.

Below are given minimal steps necessary for installing and configuring fully operational glite-WN and TORQUE_client using YAIM-3.1 and glite-3.1 repository. These instructions are based on document:


All additional RPMs mentioned in this guide can be found in AEGIS01-PHY-SCL SL4 repository.


1) OS Installation

We have chosen to install ALL PACKAGES of SL4.5. You can use your own kickstart file, of course. After the installation we removed the following packages:

rpm -e seamonkey-nspr-1.0.9-2.el4.i386 seamonkey-js-debugger-1.0.9-2.el4.x86_64 seamonkey-nss-1.0.9-2.el4.x86_64 seamonkey-dom-inspector-1.0.9-2.el4.i386 \
seamonkey-chat-1.0.9-2.el4.i386 seamonkey-nss-devel-1.0.9-2.el4.x86_64 seamonkey-1.0.9-2.el4.i386 seamonkey-nspr-devel-1.0.9-2.el4.x86_64 \
seamonkey-devel-1.0.9-2.el4.x86_64 seamonkey-mail-1.0.9-2.el4.i386 seamonkey-dom-inspector-1.0.9-2.el4.x86_64 seamonkey-nspr-1.0.9-2.el4.x86_64 \
seamonkey-1.0.9-2.el4.x86_64 seamonkey-chat-1.0.9-2.el4.x86_64 seamonkey-nss-1.0.9-2.el4.i386 seamonkey-mail-1.0.9-2.el4.x86_64 seamonkey-js-debugger-1.0.9-2.el4.i386 \
openoffice.org2-core evolution gaim fence devhelp \
openoffice.org2-langpack-nl openoffice.org2-langpack-zu_ZA openoffice.org2-langpack-zh_TW \
openoffice.org2-langpack-zh_CN openoffice.org2-langpack-tr_TR openoffice.org2-langpack-th_TH \
openoffice.org2-langpack-ta_IN  openoffice.org2-langpack-sv openoffice.org2-langpack-sr_CS \
openoffice.org2-langpack-sl_SI openoffice.org2-langpack-sk_SK openoffice.org2-langpack-ru \
openoffice.org2-langpack-pt_PT openoffice.org2-langpack-pt_BR openoffice.org2-langpack-pl_PL \
openoffice.org2-langpack-pa_IN openoffice.org2-langpack-nn_NO openoffice.org2-langpack-nb_NO \
openoffice.org2-langpack-ms_MY openoffice.org2-langpack-lt_LT openoffice.org2-langpack-ko_KR \
openoffice.org2-langpack-ja_JP openoffice.org2-langpack-it openoffice.org2-langpack-hu_HU devhelp-devel \
evolution-devel evolution-connector openoffice.org2-langpack-af_ZA openoffice.org2-langpack-ar \
openoffice.org2-langpack-bg_BG openoffice.org2-langpack-bn openoffice.org2-langpack-ca_ES \
openoffice.org2-langpack-cs_CZ openoffice.org2-langpack-cy_GB openoffice.org2-langpack-da_DK \
openoffice.org2-langpack-de openoffice.org2-langpack-el_GR openoffice.org2-langpack-es openoffice.org2-langpack-et_EE \
openoffice.org2-langpack-eu_ES openoffice.org2-langpack-fi_FI openoffice.org2-langpack-fr \
openoffice.org2-langpack-ga_IE openoffice.org2-langpack-gl_ES openoffice.org2-langpack-gu_IN \
openoffice.org2-langpack-he_IL openoffice.org2-langpack-hi_IN openoffice.org2-langpack-hr_HR openoffice.org2-calc \
openoffice.org2-xsltfilter openoffice.org2-writer openoffice.org2-testtools openoffice.org2-pyuno \
openoffice.org2-math openoffice.org2-javafilter openoffice.org2-impress openoffice.org2-graphicfilter \
openoffice.org2-draw openoffice.org2-calc openoffice.org2-base openoffice.org2-emailmerge \
SL_firefox_parentlock_fix firefox.i386 firefox.x86_64 subversion-devel apr-devel apr-util-devel httpd-devel mod_perl-devel fuse-sshfs



2) Update and consolidation of SL4.5 installation

Default package management tool that SL4.x and YAIM-3.1 or YAIM-4.0 use is yum. In /etc/yum.repos.d/ it is necessary to add the following files:

glite.repo

[glite-WN]
name=gLite 3.1 Worker Node
baseurl=http://linuxsoft.cern.ch/EGEE/gLite/R3.1/glite-WN/sl4/i386/
enabled=1

[glite-TORQUE_client]
name=Torque clients
baseurl=http://linuxsoft.cern.ch/EGEE/gLite/R3.1/glite-TORQUE_client/sl4/i386/
enabled=1

lcg-ca.repo

[CA]
name=CAs
baseurl=http://linuxsoft.cern.ch/LCG-CAs/current
enabled=1

jpackage5.0.repo

[main]
[jpackage17-generic]
name=JPackage 1.7, generic
baseurl=http://mirrors.dotsrc.org/jpackage/1.7/generic/free/
enabled=1
protect=1
    
[jpackage17-generic-nonfree]
name=JPackage 1.7, generic non-free
baseurl=http://mirrors.dotsrc.org/jpackage/1.7/generic/non-free/
enabled=1
protect=1
   
[main]
[jpackage5-generic]
name=JPackage 5, generic
baseurl=http://mirrors.dotsrc.org/jpackage/5.0/generic/free/
enabled=1
protect=1

[jpackage5-generic-nonfree]
name=JPackage 5, generic non-free
baseurl=http://mirrors.dotsrc.org/jpackage/5.0/generic/non-free/
enabled=1
protect=1

We suggest that you disable the following repos:

atrpms.repo
dag.repo
sl4x-contrib.repo
sl-rhaps.repo

We also suggest that you enable the following repos:

sl4x.repo
sl4x-errata.repo
sl4x-fastbugs.repo
sl-bugfix-46.repo
sl-fastbugs.repo

Other repos can be removed (or disabled as well). After this, to update the node, try to execute:

yum update

We got the following error:

Error: Missing Dependency: j2sdk = 2000:1.4.2_13-fcs is needed by package java-1.4.2-sun-compat

To fix this, you need to remove java-1.4.2-sun-compat package:

rpm -e java-1.4.2-sun-compat

After that, java-1.5 should be installed. To install it, it is necessary to go to SUN's Java web page and download JDK 5.0 Update 12. We used "Linux RPM in self-extracting file" jdk-1_5_0_12-linux-amd64-rpm.bin to instal jdk, but we also had to download "Linux self-extracting file" jdk-1_5_0_12-linux-amd64.bin in order to make java-1.5.0-sun-1.5.0.12-1jpp.x86_64.rpm and java-1.5.0-sun-devel-1.5.0.12-1jpp.x86_64.rpm packages, as suggested in Steve Traylen's guide.

To make and install those two packages, do the following:

rpm --import http://www.jpackage.org/jpackage.asc
mkdir -p ~/redhat/BUILD ~/redhat/SOURCES ~/redhat/SPECS ~/redhat/RPMS/i586 ~/redhat/SRPMS
cat <<EOF > ~/.rpmmacros
%_topdir    $HOME/redhat
%packager       Firstname Lastname <firstname.lastname@example.org>
EOF
rpm -Uvh http://mirrors.dotsrc.org/jpackage/1.7/generic/non-free/SRPMS/java-1.5.0-sun-1.5.0.12-1jpp.nosrc.rpm
mv jdk-1_5_0_12-linux-amd64.bin ~/redhat/SOURCES/
rpmbuild -ba ~/redhat/SPECS/java-1.5.0-sun.spec
rpm -Uvh ~/redhat/RPMS/x86_64/java-1.5.0-sun-1.5.0.12-1jpp.x86_64.rpm
rpm -Uvh ~/redhat/RPMS/x86_64/java-1.5.0-sun-devel-1.5.0.12-1jpp.x86_64.rpm
   
chmod u+x jdk-1_5_0_12-linux-amd64-rpm.bin
./jdk-1_5_0_12-linux-amd64-rpm.bin

After this step, java is finally installed, and you can perform:

yum update

Now all dependencies should be ok. This step would also update the kernel.



3) Adjust default kernel

After upgrading the kernel, you need to adjust /boot/grub/grub.conf so that the version appropriate for your hardware is used (smp, largesmp, hugemem). Reboot the system.



4) Adjust services/daemons started at the boot time

Default installation sets an excessive amount of services/daemons to be started at boot - you need to check them and disable all unnecessary ones. It is also recommended to change the default runlevel to 3 in /etc/inittab. We specially suggest that you disable yum auto-update, since this may bring trouble when new updates (requiring reconfiguration of WNs) appear, and are installed automatically. We suggest the following services to be left to start at boot time:

# cd /etc/rc3.d/
# ll S*
lrwxrwxrwx  1 root root 23 Sep  4 14:00 S00microcode_ctl -> ../init.d/microcode_ctl
lrwxrwxrwx  1 root root 17 Sep  4 14:00 S01sysstat -> ../init.d/sysstat
lrwxrwxrwx  1 root root 17 Sep  4 14:00 S10network -> ../init.d/network
lrwxrwxrwx  1 root root 16 Sep  4 14:00 S12syslog -> ../init.d/syslog
lrwxrwxrwx  1 root root 20 Sep  4 14:00 S13irqbalance -> ../init.d/irqbalance
lrwxrwxrwx  1 root root 17 Sep  4 14:00 S13portmap -> ../init.d/portmap
lrwxrwxrwx  1 root root 17 Sep  4 14:00 S14nfslock -> ../init.d/nfslock
lrwxrwxrwx  1 root root 15 Sep  4 14:00 S25netfs -> ../init.d/netfs
lrwxrwxrwx  1 root root 14 Sep  4 14:00 S55sshd -> ../init.d/sshd
lrwxrwxrwx  1 root root 20 Sep  4 14:00 S56rawdevices -> ../init.d/rawdevices
lrwxrwxrwx  1 root root 16 Sep  4 14:00 S56xinetd -> ../init.d/xinetd
lrwxrwxrwx  1 root root 14 Sep  4 14:00 S58ntpd -> ../init.d/ntpd
lrwxrwxrwx  1 root root 18 Sep  4 14:00 S80sendmail -> ../init.d/sendmail
lrwxrwxrwx  1 root root 15 Sep  4 14:00 S90crond -> ../init.d/crond
lrwxrwxrwx  1 root root 11 Sep  4 14:00 S99local -> ../rc.local

We do not suggest installation of SELinux because it slows down execution of mpi jobs when mpiexec is used. If it is installed, it should be disabled by changing line SELINUX=enforcing with line SELINUX=disabled in the file /etc/selinux/config.



5) Adjust file systems

If you use shared file system, it is necessary to configure new WN to mount it automatically and with proper permissions. Also, if you use scratch space for jobs on WNs, you need to configure it prior to jobs arriving at the new WN. We are mounting /home, /var/cache/yum and /opt/exp_soft file systems from NFS server.



6) NTP configuration

As usual, NTP needs to be configured and verified. This is sufficent:

# cat /etc/ntp.conf 
restrict default noquery notrust nomodify
restrict 127.0.0.1
restrict 147.91.84.0 mask 255.255.255.0
restrict 129.132.2.21
server 129.132.2.21
restrict 131.188.3.220
server 131.188.3.220
driftfile /etc/ntp.drift
logfile /var/log/ntp.log
# ll /etc/ntp.drift
-rw-r--r--  1 ntp ntp 0 Aug 29 12:02 /etc/ntp.drift

Note the ownership of ntp.drift file. Also note that ntp must be started and enabled to be started at boot time.



7) Certification Authorities

yum install lcg-CA



8) Additional RPMs installation:

In order to successfully install glite-WN, you need to add perl-SOAP-Lite and some additional packages:

yum install log4j
wget http://glite.phy.bg.ac.yu/GLITE-3/SL4/perl-SOAP-Lite-0.65.6-1.noarch.rpm
rpm -Uvh perl-SOAP-Lite-0.65.6-1.noarch.rpm
wget http://glite.phy.bg.ac.yu/GLITE-3/SL4/bouncycastle-jdk14_1.19-2_noarch.rpm
rpm -Uvh bouncycastle-jdk14_1.19-2_noarch.rpm
wget http://glite.phy.bg.ac.yu/GLITE-3/SL4/edg-java-security_1.5.11-1_sl3_noarch.rpm
rpm -Uvh edg-java-security_1.5.11-1_sl3_noarch.rpm
wget http://glite.phy.bg.ac.yu/GLITE-3/SL4/edg-java-security-client_1.5.11-1_sl3_noarch.rpm
rpm -Uvh edg-java-security-client_1.5.11-1_sl3_noarch.rpm
wget http://glite.phy.bg.ac.yu/GLITE-3/SL4/edg-java-security-test_1.5.11-1_sl3_noarch.rpm
rpm -Uvh edg-java-security-test_1.5.11-1_sl3_noarch.rpm

Default torque used in EGEE production is now 2.1.9-4. Be careful here - the torque version must be the same on CE and all WNs!!! As always, RPMs of maui and torque can be found in ETICS repository maintained by Steve Traylen:

http://eticssoft.web.cern.ch/eticssoft/repository/torquemaui/


However, you can compily it from src, which can also be found at the above link, using the following commands:

wget http://eticssoft.web.cern.ch/eticssoft/repository/torquemaui/torque/2.1.9-4/src/torque-2.1.9-4cri.slc4.src.rpm
rpm -i torque-2.1.9-4cri.slc4.src.rpm
cd /usr/src/redhat/SOURCES
tar -xvzf torque-2.1.9.tar.gz
cd torque-2.1.9
./configure CC="gcc -m64" --prefix=/usr --with-server-home=/var/spool/pbs
make
make install_mom install_clients

If you install torque this way (from source), you would need to create pbs_mom script and put it into /etc/init.d directory. You can see its content below:

[root@wn01 root]# cat /etc/init.d/pbs_mom
#!/bin/sh
#
# pbs_mom       This script will start and stop the PBS Mom
#
# chkconfig: 345 95 5
# description: TORQUE/PBS is a versatile batch system for SMPs and clusters
#
ulimit -n 32768
# Source the library functions
. /etc/rc.d/init.d/functions

PBS_DAEMON=/usr/sbin/pbs_mom
PBS_HOME=/var/spool/pbs
export PBS_DAEMON PBS_HOME

if [ -f /etc/sysconfig/pbs_mom ];then
    . /etc/sysconfig/pbs_mom
fi

args=""
if [ -z "$previous" ];then
   # being run manually, don't disturb jobs
   args="-p"
fi

pidof_pbs_mom() {
   pid="-1"
   if [ -f $PBS_HOME/mom_priv/mom.lock ];then
        pid=`cat $PBS_HOME/mom_priv/mom.lock`
   fi
   echo $pid
}

kill_pbs_mom() {
   pid=`pidof_pbs_mom`

   if [ $pid -le 1 ];then
      return -1;
   fi
   retval=1
   while kill -0 $pid 2>/dev/null;do
      kill -TERM $pid
      retval=$?
      sleep 1
   done
   return $retval
}

# how were we called
case "$1" in
        start)
                echo -n "Starting TORQUE Mom: "
                daemon $PBS_DAEMON $args
                RET=$?
                touch /var/lock/subsys/pbs_mom
                echo
                ;;
        purge)
                [ -f /var/lock/subsys/pbs_mom ] && $0 stop
                echo -n "Starting TORQUE Mom with purge: "
                daemon $PBS_DAEMON -r
                RET=$?
                touch /var/lock/subsys/pbs_mom
                echo
                ;;
        stop)
                echo -n "Shutting down TORQUE Mom: "
                kill_pbs_mom
                RET=$?
                [ $RET -eq 0 ] && success "shutdown" || failure "shutdown"
                echo
                rm -f /var/lock/subsys/pbs_mom
                ;;
        status)
                status pbs_mom
                RET=$?
                ;;
        restart)
                $0 stop
                sleep 1
                $0 start
                ;;
        reload)
                echo -n "Re-reading TORQUE Mom config file: "
                kill -SIGHUP `pidof_pbs_mom`
                RET=$?
                [ $RET -eq 0 ] && success "HUP" || failure "HUP"
                echo
                ;;
        *)
                echo "Usage: pbs_mom {start|stop|restart|reload|status|purge}"
                exit 1
esac
exit $RET

pbs_mom should be started manually then:

/etc/init.d/pbs_mom start

and of course added to the list of services started at boot time:

chkconfig pbs_mom on

You can also successfully use 32bit version of torque. Just ensure that torque version on CE and all WNs is same.

As always, RPMs of maui and torque can be found in ETICS repository maintained by Steve Traylen:

http://eticssoft.web.cern.ch/eticssoft/repository/torquemaui/



9) Install glite-WN and torque-related rpms

Since torque was installed from source, we will not install glite-TORQUE_client meta-rpm. Instead, we will install glite-WN meta-rpm and separately all rpms glite-TORQUE_client depends on minus torque-* packages (which is currently just edg-pbs-utils):

yum install glite-WN
yum install edg-pbs-utils

This will install all packages needed for configuring glite-WN and TORQUE_client. When RPM for 64 bit torque is available, then it will be simpler, and just installing glite-WN and glite-TORQUE_client would do the trick.



10) Preparation of conf files for new WN

Conf files necessary are: site-info.def, wn-list.conf, users.conf, and groups.conf, as well as the files in vo.d directory. If you have the conf files for recent enough version of YAIM, then little should be changed. According to the YAIM-3.1 Guide, several new site-info.def variables are added, so adapt your config files accordingly. Probably you will just use the conf files from other WNs, since they practically do not change with migration from SL3.0.x to SL4.5. The only important exception is the JAVA_LOCATION, which should point to 1.5 java on SL 4.x WNs, e.g.




11) Configuring node

To configure node type:

/opt/glite/yaim/bin/yaim -c -s <path to site-info.def> -n WN -n TORQUE_client

Final tweaking includes adjusting of /var/spool/pbs/mom_priv/config file, creating of /etc/ssh/shosts.equiv if it already doesn't exist on new WN and updating this file on CE and other WNs, and of course, updating of ssh_known_hosts on all other nodes to include data about the new one (standard procedure for ssh).

After this is done, the new WN can be added to the pbs server. After this, cron jobs will take care about shosts.equiv and ssh_known_hosts files.

In order to pass rgmasc SAM test, it is also necessary to add /opt/edg/sbin to the shell variable PATH, e.g. in some local file for bash and csh.

If you have previously installed the plugin on your CE to dynamically publish OS version, now is probably a good idea to disable it (remove the plugin from /opt/lcg/var/gip/plugin), and to adjust ldif files so that the correct OS version and release are published.


Personal tools