SL4 WN glite-3.1 64bit
From EGEE-see WIki
AEGIS01-PHY-SCL notes on installing and configuring natively compiled glite-WN and TORQUE_client on SL4.5 (64 bit)
In order to verify installation and configuration process of glite-WN and TORQUE_client on SL4 64 bit, we at AEGIS01-PHY-SCL tested the installation and configuration, and it was successful. These notes provide instructions on what we have done. We were using Dual-Core AMD Opteron processor 8218, but it should work on Intel dual-core and quad-core CPUs as well - the only difference is the Java executable to be used.
Below are given minimal steps necessary for installing and configuring fully operational glite-WN and TORQUE_client using YAIM-3.1 and glite-3.1 repository. These instructions are based on document:
All additional RPMs mentioned in this guide can be found in AEGIS01-PHY-SCL SL4 repository.
1) OS Installation
We have chosen to install ALL PACKAGES of SL4.5. You can use your own kickstart file, of course. After the installation we removed the following packages:
rpm -e seamonkey-nspr-1.0.9-2.el4.i386 seamonkey-js-debugger-1.0.9-2.el4.x86_64 seamonkey-nss-1.0.9-2.el4.x86_64 seamonkey-dom-inspector-1.0.9-2.el4.i386 \ seamonkey-chat-1.0.9-2.el4.i386 seamonkey-nss-devel-1.0.9-2.el4.x86_64 seamonkey-1.0.9-2.el4.i386 seamonkey-nspr-devel-1.0.9-2.el4.x86_64 \ seamonkey-devel-1.0.9-2.el4.x86_64 seamonkey-mail-1.0.9-2.el4.i386 seamonkey-dom-inspector-1.0.9-2.el4.x86_64 seamonkey-nspr-1.0.9-2.el4.x86_64 \ seamonkey-1.0.9-2.el4.x86_64 seamonkey-chat-1.0.9-2.el4.x86_64 seamonkey-nss-1.0.9-2.el4.i386 seamonkey-mail-1.0.9-2.el4.x86_64 seamonkey-js-debugger-1.0.9-2.el4.i386 \ openoffice.org2-core evolution gaim fence devhelp \ openoffice.org2-langpack-nl openoffice.org2-langpack-zu_ZA openoffice.org2-langpack-zh_TW \ openoffice.org2-langpack-zh_CN openoffice.org2-langpack-tr_TR openoffice.org2-langpack-th_TH \ openoffice.org2-langpack-ta_IN openoffice.org2-langpack-sv openoffice.org2-langpack-sr_CS \ openoffice.org2-langpack-sl_SI openoffice.org2-langpack-sk_SK openoffice.org2-langpack-ru \ openoffice.org2-langpack-pt_PT openoffice.org2-langpack-pt_BR openoffice.org2-langpack-pl_PL \ openoffice.org2-langpack-pa_IN openoffice.org2-langpack-nn_NO openoffice.org2-langpack-nb_NO \ openoffice.org2-langpack-ms_MY openoffice.org2-langpack-lt_LT openoffice.org2-langpack-ko_KR \ openoffice.org2-langpack-ja_JP openoffice.org2-langpack-it openoffice.org2-langpack-hu_HU devhelp-devel \ evolution-devel evolution-connector openoffice.org2-langpack-af_ZA openoffice.org2-langpack-ar \ openoffice.org2-langpack-bg_BG openoffice.org2-langpack-bn openoffice.org2-langpack-ca_ES \ openoffice.org2-langpack-cs_CZ openoffice.org2-langpack-cy_GB openoffice.org2-langpack-da_DK \ openoffice.org2-langpack-de openoffice.org2-langpack-el_GR openoffice.org2-langpack-es openoffice.org2-langpack-et_EE \ openoffice.org2-langpack-eu_ES openoffice.org2-langpack-fi_FI openoffice.org2-langpack-fr \ openoffice.org2-langpack-ga_IE openoffice.org2-langpack-gl_ES openoffice.org2-langpack-gu_IN \ openoffice.org2-langpack-he_IL openoffice.org2-langpack-hi_IN openoffice.org2-langpack-hr_HR openoffice.org2-calc \ openoffice.org2-xsltfilter openoffice.org2-writer openoffice.org2-testtools openoffice.org2-pyuno \ openoffice.org2-math openoffice.org2-javafilter openoffice.org2-impress openoffice.org2-graphicfilter \ openoffice.org2-draw openoffice.org2-calc openoffice.org2-base openoffice.org2-emailmerge \ SL_firefox_parentlock_fix firefox.i386 firefox.x86_64 subversion-devel apr-devel apr-util-devel httpd-devel mod_perl-devel fuse-sshfs
2) Update and consolidation of SL4.5 installation
Default package management tool that SL4.x and YAIM-3.1 or YAIM-4.0 use is yum. In /etc/yum.repos.d/ it is necessary to add the following files:
glite.repo
[glite-WN] name=gLite 3.1 Worker Node baseurl=http://linuxsoft.cern.ch/EGEE/gLite/R3.1/glite-WN/sl4/i386/ enabled=1 [glite-TORQUE_client] name=Torque clients baseurl=http://linuxsoft.cern.ch/EGEE/gLite/R3.1/glite-TORQUE_client/sl4/i386/ enabled=1
lcg-ca.repo
[CA] name=CAs baseurl=http://linuxsoft.cern.ch/LCG-CAs/current enabled=1
jpackage5.0.repo
[main] [jpackage17-generic] name=JPackage 1.7, generic baseurl=http://mirrors.dotsrc.org/jpackage/1.7/generic/free/ enabled=1 protect=1 [jpackage17-generic-nonfree] name=JPackage 1.7, generic non-free baseurl=http://mirrors.dotsrc.org/jpackage/1.7/generic/non-free/ enabled=1 protect=1 [main] [jpackage5-generic] name=JPackage 5, generic baseurl=http://mirrors.dotsrc.org/jpackage/5.0/generic/free/ enabled=1 protect=1 [jpackage5-generic-nonfree] name=JPackage 5, generic non-free baseurl=http://mirrors.dotsrc.org/jpackage/5.0/generic/non-free/ enabled=1 protect=1
We suggest that you disable the following repos:
atrpms.repo dag.repo sl4x-contrib.repo sl-rhaps.repo
We also suggest that you enable the following repos:
sl4x.repo sl4x-errata.repo sl4x-fastbugs.repo sl-bugfix-46.repo sl-fastbugs.repo
Other repos can be removed (or disabled as well). After this, to update the node, try to execute:
yum update
We got the following error:
Error: Missing Dependency: j2sdk = 2000:1.4.2_13-fcs is needed by package java-1.4.2-sun-compat
To fix this, you need to remove java-1.4.2-sun-compat package:
rpm -e java-1.4.2-sun-compat
After that, java-1.5 should be installed. To install it, it is necessary to go to SUN's Java web page and download JDK 5.0 Update 12. We used "Linux RPM in self-extracting file" jdk-1_5_0_12-linux-amd64-rpm.bin to instal jdk, but we also had to download "Linux self-extracting file" jdk-1_5_0_12-linux-amd64.bin in order to make java-1.5.0-sun-1.5.0.12-1jpp.x86_64.rpm and java-1.5.0-sun-devel-1.5.0.12-1jpp.x86_64.rpm packages, as suggested in Steve Traylen's guide.
To make and install those two packages, do the following:
rpm --import http://www.jpackage.org/jpackage.asc mkdir -p ~/redhat/BUILD ~/redhat/SOURCES ~/redhat/SPECS ~/redhat/RPMS/i586 ~/redhat/SRPMS cat <<EOF > ~/.rpmmacros %_topdir $HOME/redhat %packager Firstname Lastname <firstname.lastname@example.org> EOF rpm -Uvh http://mirrors.dotsrc.org/jpackage/1.7/generic/non-free/SRPMS/java-1.5.0-sun-1.5.0.12-1jpp.nosrc.rpm mv jdk-1_5_0_12-linux-amd64.bin ~/redhat/SOURCES/ rpmbuild -ba ~/redhat/SPECS/java-1.5.0-sun.spec rpm -Uvh ~/redhat/RPMS/x86_64/java-1.5.0-sun-1.5.0.12-1jpp.x86_64.rpm rpm -Uvh ~/redhat/RPMS/x86_64/java-1.5.0-sun-devel-1.5.0.12-1jpp.x86_64.rpm chmod u+x jdk-1_5_0_12-linux-amd64-rpm.bin ./jdk-1_5_0_12-linux-amd64-rpm.bin
After this step, java is finally installed, and you can perform:
yum update
Now all dependencies should be ok. This step would also update the kernel.
3) Adjust default kernel
After upgrading the kernel, you need to adjust /boot/grub/grub.conf so that the version appropriate for your hardware is used (smp, largesmp, hugemem). Reboot the system.
4) Adjust services/daemons started at the boot time
Default installation sets an excessive amount of services/daemons to be started at boot - you need to check them and disable all unnecessary ones. It is also recommended to change the default runlevel to 3 in /etc/inittab. We specially suggest that you disable yum auto-update, since this may bring trouble when new updates (requiring reconfiguration of WNs) appear, and are installed automatically. We suggest the following services to be left to start at boot time:
# cd /etc/rc3.d/ # ll S* lrwxrwxrwx 1 root root 23 Sep 4 14:00 S00microcode_ctl -> ../init.d/microcode_ctl lrwxrwxrwx 1 root root 17 Sep 4 14:00 S01sysstat -> ../init.d/sysstat lrwxrwxrwx 1 root root 17 Sep 4 14:00 S10network -> ../init.d/network lrwxrwxrwx 1 root root 16 Sep 4 14:00 S12syslog -> ../init.d/syslog lrwxrwxrwx 1 root root 20 Sep 4 14:00 S13irqbalance -> ../init.d/irqbalance lrwxrwxrwx 1 root root 17 Sep 4 14:00 S13portmap -> ../init.d/portmap lrwxrwxrwx 1 root root 17 Sep 4 14:00 S14nfslock -> ../init.d/nfslock lrwxrwxrwx 1 root root 15 Sep 4 14:00 S25netfs -> ../init.d/netfs lrwxrwxrwx 1 root root 14 Sep 4 14:00 S55sshd -> ../init.d/sshd lrwxrwxrwx 1 root root 20 Sep 4 14:00 S56rawdevices -> ../init.d/rawdevices lrwxrwxrwx 1 root root 16 Sep 4 14:00 S56xinetd -> ../init.d/xinetd lrwxrwxrwx 1 root root 14 Sep 4 14:00 S58ntpd -> ../init.d/ntpd lrwxrwxrwx 1 root root 18 Sep 4 14:00 S80sendmail -> ../init.d/sendmail lrwxrwxrwx 1 root root 15 Sep 4 14:00 S90crond -> ../init.d/crond lrwxrwxrwx 1 root root 11 Sep 4 14:00 S99local -> ../rc.local
We do not suggest installation of SELinux because it slows down execution of mpi jobs when mpiexec is used. If it is installed, it should be disabled by changing line SELINUX=enforcing with line SELINUX=disabled in the file /etc/selinux/config.
5) Adjust file systems
If you use shared file system, it is necessary to configure new WN to mount it automatically and with proper permissions. Also, if you use scratch space for jobs on WNs, you need to configure it prior to jobs arriving at the new WN. We are mounting /home, /var/cache/yum and /opt/exp_soft file systems from NFS server.
6) NTP configuration
As usual, NTP needs to be configured and verified. This is sufficent:
# cat /etc/ntp.conf restrict default noquery notrust nomodify restrict 127.0.0.1 restrict 147.91.84.0 mask 255.255.255.0 restrict 129.132.2.21 server 129.132.2.21 restrict 131.188.3.220 server 131.188.3.220 driftfile /etc/ntp.drift logfile /var/log/ntp.log # ll /etc/ntp.drift -rw-r--r-- 1 ntp ntp 0 Aug 29 12:02 /etc/ntp.drift
Note the ownership of ntp.drift file. Also note that ntp must be started and enabled to be started at boot time.
7) Certification Authorities
yum install lcg-CA
8) Additional RPMs installation:
In order to successfully install glite-WN, you need to add perl-SOAP-Lite and some additional packages:
yum install log4j wget http://glite.phy.bg.ac.yu/GLITE-3/SL4/perl-SOAP-Lite-0.65.6-1.noarch.rpm rpm -Uvh perl-SOAP-Lite-0.65.6-1.noarch.rpm wget http://glite.phy.bg.ac.yu/GLITE-3/SL4/bouncycastle-jdk14_1.19-2_noarch.rpm rpm -Uvh bouncycastle-jdk14_1.19-2_noarch.rpm wget http://glite.phy.bg.ac.yu/GLITE-3/SL4/edg-java-security_1.5.11-1_sl3_noarch.rpm rpm -Uvh edg-java-security_1.5.11-1_sl3_noarch.rpm wget http://glite.phy.bg.ac.yu/GLITE-3/SL4/edg-java-security-client_1.5.11-1_sl3_noarch.rpm rpm -Uvh edg-java-security-client_1.5.11-1_sl3_noarch.rpm wget http://glite.phy.bg.ac.yu/GLITE-3/SL4/edg-java-security-test_1.5.11-1_sl3_noarch.rpm rpm -Uvh edg-java-security-test_1.5.11-1_sl3_noarch.rpm
Default torque used in EGEE production is now 2.1.9-4. Be careful here - the torque version must be the same on CE and all WNs!!! As always, RPMs of maui and torque can be found in ETICS repository maintained by Steve Traylen:
http://eticssoft.web.cern.ch/eticssoft/repository/torquemaui/
However, you can compily it from src, which can also be found at the above link, using the following commands:
wget http://eticssoft.web.cern.ch/eticssoft/repository/torquemaui/torque/2.1.9-4/src/torque-2.1.9-4cri.slc4.src.rpm rpm -i torque-2.1.9-4cri.slc4.src.rpm cd /usr/src/redhat/SOURCES tar -xvzf torque-2.1.9.tar.gz cd torque-2.1.9 ./configure CC="gcc -m64" --prefix=/usr --with-server-home=/var/spool/pbs make make install_mom install_clients
If you install torque this way (from source), you would need to create pbs_mom script and put it into /etc/init.d directory. You can see its content below:
[root@wn01 root]# cat /etc/init.d/pbs_mom
#!/bin/sh
#
# pbs_mom This script will start and stop the PBS Mom
#
# chkconfig: 345 95 5
# description: TORQUE/PBS is a versatile batch system for SMPs and clusters
#
ulimit -n 32768
# Source the library functions
. /etc/rc.d/init.d/functions
PBS_DAEMON=/usr/sbin/pbs_mom
PBS_HOME=/var/spool/pbs
export PBS_DAEMON PBS_HOME
if [ -f /etc/sysconfig/pbs_mom ];then
. /etc/sysconfig/pbs_mom
fi
args=""
if [ -z "$previous" ];then
# being run manually, don't disturb jobs
args="-p"
fi
pidof_pbs_mom() {
pid="-1"
if [ -f $PBS_HOME/mom_priv/mom.lock ];then
pid=`cat $PBS_HOME/mom_priv/mom.lock`
fi
echo $pid
}
kill_pbs_mom() {
pid=`pidof_pbs_mom`
if [ $pid -le 1 ];then
return -1;
fi
retval=1
while kill -0 $pid 2>/dev/null;do
kill -TERM $pid
retval=$?
sleep 1
done
return $retval
}
# how were we called
case "$1" in
start)
echo -n "Starting TORQUE Mom: "
daemon $PBS_DAEMON $args
RET=$?
touch /var/lock/subsys/pbs_mom
echo
;;
purge)
[ -f /var/lock/subsys/pbs_mom ] && $0 stop
echo -n "Starting TORQUE Mom with purge: "
daemon $PBS_DAEMON -r
RET=$?
touch /var/lock/subsys/pbs_mom
echo
;;
stop)
echo -n "Shutting down TORQUE Mom: "
kill_pbs_mom
RET=$?
[ $RET -eq 0 ] && success "shutdown" || failure "shutdown"
echo
rm -f /var/lock/subsys/pbs_mom
;;
status)
status pbs_mom
RET=$?
;;
restart)
$0 stop
sleep 1
$0 start
;;
reload)
echo -n "Re-reading TORQUE Mom config file: "
kill -SIGHUP `pidof_pbs_mom`
RET=$?
[ $RET -eq 0 ] && success "HUP" || failure "HUP"
echo
;;
*)
echo "Usage: pbs_mom {start|stop|restart|reload|status|purge}"
exit 1
esac
exit $RET
pbs_mom should be started manually then:
/etc/init.d/pbs_mom start
and of course added to the list of services started at boot time:
chkconfig pbs_mom on
You can also successfully use 32bit version of torque. Just ensure that torque version on CE and all WNs is same.
As always, RPMs of maui and torque can be found in ETICS repository maintained by Steve Traylen:
http://eticssoft.web.cern.ch/eticssoft/repository/torquemaui/
9) Install glite-WN and torque-related rpms
Since torque was installed from source, we will not install glite-TORQUE_client meta-rpm. Instead, we will install glite-WN meta-rpm and separately all rpms glite-TORQUE_client depends on minus torque-* packages (which is currently just edg-pbs-utils):
yum install glite-WN yum install edg-pbs-utils
This will install all packages needed for configuring glite-WN and TORQUE_client. When RPM for 64 bit torque is available, then it will be simpler, and just installing glite-WN and glite-TORQUE_client would do the trick.
10) Preparation of conf files for new WN
Conf files necessary are: site-info.def, wn-list.conf, users.conf, and groups.conf, as well as the files in vo.d directory. If you have the conf files for recent enough version of YAIM, then little should be changed. According to the YAIM-3.1 Guide, several new site-info.def variables are added, so adapt your config files accordingly. Probably you will just use the conf files from other WNs, since they practically do not change with migration from SL3.0.x to SL4.5. The only important exception is the JAVA_LOCATION, which should point to 1.5 java on SL 4.x WNs, e.g.
11) Configuring node
To configure node type:
/opt/glite/yaim/bin/yaim -c -s <path to site-info.def> -n WN -n TORQUE_client
Final tweaking includes adjusting of /var/spool/pbs/mom_priv/config file, creating of /etc/ssh/shosts.equiv if it already doesn't exist on new WN and updating this file on CE and other WNs, and of course, updating of ssh_known_hosts on all other nodes to include data about the new one (standard procedure for ssh).
After this is done, the new WN can be added to the pbs server. After this, cron jobs will take care about shosts.equiv and ssh_known_hosts files.
In order to pass rgmasc SAM test, it is also necessary to add /opt/edg/sbin to the shell variable PATH, e.g. in some local file for bash and csh.
If you have previously installed the plugin on your CE to dynamically publish OS version, now is probably a good idea to disable it (remove the plugin from /opt/lcg/var/gip/plugin), and to adjust ldif files so that the correct OS version and release are published.
