SG LCG-2 6 0 Guide

From EGEE-see WIki

Jump to: navigation, search

To upgrade your existing LCG-2_4_0 site to LCG-2_6_0, please follow the directions given at

http://grid-deployment.web.cern.ch/grid-deployment/documentation/LCG2-Manual-Upgrade/

Additional recommendations for SEE-GRID sites

[All references "..." are to the sections of the above document. These recommendations are based (besides personal experiences) on "Manual Upgrade Procedure", and various e-mails on LCG-ROLLOUT and EGEE-SA1-TECH, most notably by Maarten Litmaath, Maria Nassiakou, and Dimitris Zilaskos.]


All files used in this guide can be found at http://lcg.phy.bg.ac.yu/LCG-2_6_0/

Other resources: take a look at the very good guide with AUTH experiences at http://www.grid.auth.gr/guides/middleware/upgrading_to_lcg-2.6.0.php


0) SCL scripts for easier administration

A small set of scripts developed at AEGIS01-PHY-SCL, enabling

a) providing public/private key based ssh authentication among all nodes

b) scp of file from one node to all other nodes (or to all WNs, or to all nodes except for WNs)

c) execution of the issued command on all nodes (or on all WNs, or on all nodes except for WNs)

can be found at http://lcg.phy.bg.ac.yu/LCG-2_6_0/scl-scripts.tgz

Read README file for more details.


1) Backup

Please backup your site-info.def, wn-list.conf, and users.conf files if you are keeping them in /opt/lcg/yaim/examples/ directory, since they will be overwritten by yaim upgrade. Feel free to backup all other conf files you have customized. Just be aware not to copy them afterwards over the upgraded ones, but do diff and change appropriate values, since even format can be different!


2) BDIIs (Section "Preliminary Instructions")

As it is stated in the Manual Upgrade Procedure, first top-level BDII should be upgraded (within SEE-GRID there are only two: at TR-01-ULAKBIM and at AEGIS01-PHY-SCL), then site BDII (located at CE), and then all other nodes. It is recommended that the bdii is stopped before upgrade (e.g service lcg-bdii stop).


3) MON box and tomcat (Section "Preliminary Instructions")

If you have MON box, uninstall tomcat4 prior to upgrade:

apt-get remove tomcat4

or

rpm -e tomcat4 edg-java-security-tomcat4 lcg-MON

Allow for removing of other packages, as requested by apt-get. However, after upgrading your MON box, some files are still owned by tomcat4 user! So, if not present, create user/group tomcat4, on MON box or upgrade will fail.

Install also new j2sdk-1.4.2_08-fcs, provided by Sun in the form of self extracting archive, which will produce rpm, which should be installed. Uninstall previous versions -- they might interfere with tomcat5, even prevent its successful starting (but not installation, just to keep you confused)!

Before running the configure_node script on the MON box make sure that the host keys are in place. The configuration script will attempt to copy them in /etc/tomcat5 for tomcat to access them.

On the MON box, there is a problem with the /etc/init.d/rgma-servicetool script (possibly this bug https://savannah.cern.ch/bugs/?func=detailitem&item_id=7332 ) that leaves some processes attached to the terminal. This causes sometimes the configuration script on the MON box to not return a prompt after the configuration is complete. If that happens, send the script to the background to get to a prompt and restart rgma-servicetool.


4) Scientific Linux CERN (Section "Update your middleware")

If you are using plain Scientific Linux, disregard Scientific Linux CERN specific directions in "Manual Upgrade Procedure".


5) Before configure_node (Section "Reconfigure")

Step 3 of "Update your middleware" section is very important -- please apply it carefuly on each node. In addition, if you have an LFC node, you have to issue

rpm -e --nodeps lcg-LFC-mysql

apt-get install lcg-LFC_mysql

since naming convention has changed.

Although it is not stated explicitly, I suggest also to run

/opt/lcg/yaim/scripts/install_node <site-info.def> <node_name> [<another_node_name ...]

on each node. This should be done after applying all steps from the section "Update your middleware". Do not hesitate to do install_node several time, until you are sure that everything is properly installed. Take care of new node names. After the rpm installation is complete, make sure that there are no RPM dependency issues.

Templates for SEE-GRID:

http://lcg.phy.bg.ac.yu/LCG-2_6_0/site-info.def-SEE-GRID

http://lcg.phy.bg.ac.yu/LCG-2_6_0/users.conf-SEE-GRID

These templates contain SEE-GRID specific variables. You can also see AEGIS01-PHY-SCL specific files, used for configuration of our site (password values were removed):

http://lcg.phy.bg.ac.yu/LCG-2_6_0/site-info.def-AEGIS01-PHY-SCL-LCG-2_6_0

http://lcg.phy.bg.ac.yu/LCG-2_6_0/wn-list.conf-AEGIS01-PHY-SCL

Change these files so to suit your site. site-info.def-SEE-GRID assumes that users.conf and wn-list.conf are located in current directory (.). Please note that you have to create wn-list.conf! users.conf-SEE-GRID contains pools of users for the following VOs: seegrid, alice, atlas, cms, lhcb, dteam, sixt, esr. If you have used different values for UIDs, change them accordingly in users.conf-SEE-GRID prior to configure_node, or, if you want to use the values from this file, delete all VO users (delete their home directories and remove their entries in /etc/passwd, /etc/shadow and /etc/group).

GridICE collector is assumed to be installed on SE (LCG-2_6_0 default is MON box).


6) install_node on WNs

I have encountered a strange problem with install_node on WNs. For some unknown reason, /etc/apt/sources.list.d/lcg.list was ill formatted by install_node. If this happen:

a) copy http://lcg.phy.bg.ac.yu/LCG-2_6_0/lcg.list to /etc/apt/sources.list.d/ on all WNs;

b) copy http://lcg.phy.bg.ac.yu/LCG-2_6_0/config_apt to /opt/lcg/yaim/functions/ on all WNs and ensure that it can be executed (chmod u+x /opt/lcg/yaim/functions/config_apt);

c) do install_node again.

The same procedure can be applied on other nodes if needed.

 Note: 
 I have seen the same strange problem as well, but I think that I found out the reason. It was that the     
 LCG_REPOSITORY in the site-info.def was wrong (the old one). Maybe this is not exactly the same problem that 
 you are talking about.
 --Dashamir

7) configure_node ("Reconfigure")

As explained in step 5, use SEE-GRID templates, available at http://lcg.phy.bg.ac.yu/LCG-2_6_0/


8) Postconfiguration of CE

Edit /opt/lcg/var/gip/lcg-info-static.ldif on your CE and set appropriate values for GlueSiteUserSupportContact, GlueSiteSysAdminContact, GlueSiteSecurityContact, as well as for GlueSubClusterPhysicalCPUs, and GlueSubClusterLogicalCPUs. They will be visible at GStat page for your site, accessible from http://goc.grid.sinica.edu.tw/gstat/seegrid/

Do not forget to restart globus-mds.

Please note that running the yaim configuration script on your CE will rebuild the queues configuration: if you applied modifications on the default configuration provided by yaim you will loose your changes! You have to apply them again. It is useful to keep such things in a script. Example available at http://lcg.phy.bg.ac.yu/LCG-2_6_0/AEGIS01-PHY-SCL-queues


9) Postconfiguration of MON (if you have one) and CE

Insert the line APEL_HOME=/opt/glite in the script /opt/glite/bin/apel-publisher on the MON and also in /opt/glite/bin/apel-pbs-log-parser on the CE.

On the MON as root:

cd /opt/glite/share/glite-apel-core/java

tar -xvzf mm-mysql.jar

rm mm-mysql.jar

ln -s mysql-connector-java-1.3.8-bin.jar mm-mysql.jar

/etc/init.d/rgma-gin restart

/etc/init.d/rgma-servicetool restart

For additinal corrections of APEL PBS parser and publisher on CE and MON box, see instructions on

http://listserv.cclrc.ac.uk/cgi-bin/webadmin?A2=ind0508&L=lcg-rollout&O=D&P=50152

Personal tools