SG GridIce Guide

From EGEE-see WIki

Jump to: navigation, search

1. As you already know the GridIce installation (sensors and server) are included in the LCG-2 release. Depending on the release you will need to upgrade some RPM-s.

The new RPM-s for the monitoring nodes (CE, SE and WN (still haven't tried to connect the WN)) are:

  • gridice-sensor (1.5.1)
  • edg-fabricMonitoring (2.5.4-5)

that can be found at

Upgrade the packages using

rpm -U edg-fabricMonitoring_gcc3_2_2-2.5.4-5_sl3.i386.rpm
rpm -U edt_sensor-1.5.1-sl3.i386.rpm 


2. update YAIM SCRIPT in your site-info.def, be sure that the following variables are defined:

# collector for the GridICE monitoring data 
# it must be a machine where an MDS is installed 
# and which GRIS or GIIS is registered to the BDII 
# typical choice: $SE_HOST 
GRIDICE_SERVER_HOST=$SE_HOST 
# in case of PBS/Torque bacth system, be sure that the 
# accounting directory is accessible by the CE machine 
# and configure the folowing variable: 
# - for torque the path is typically /usr/spool/torque 
# - for pbs the path is typically /var/spool/pbs 
CE_LRMS_SPOOL=/var/spool/pbs

3. update the /opt/lcg/yaim/functions/config_fmon_client with the following one:

reconfigure your node: (Example)

/opt/lcg/yaim/scripts/configure_node site-info.def CE_torque

Here follows the older configuration for previous version of LCG but some parts still apply. The last part is very IMPORTANT (see testing at the end)


Contents

site collector node

You should choose a GridIce site collector node, which is usually the SE.

Then undergo the following instructions (some are already done by the configure_node script but still): 1. start edg-fmon-server (collector service)

/etc/rc.d/init.d/edg-fmon-server start 

and configure the start of this daemon at boot

chkconfig --level 345 edg-fmon-server on 

2. start gridice-mds (publisher service)

/etc/rc.d/init.d/gridice-mds start 

and configure the start of this daemon at boot

chkconfig --level 345 gridice-mds on

3. (IMPORTANT !!! also when upgrading - the configure_node script misses this) edit /etc/globus.conf and after the line:

[mds/gris/provider/edg]

add a blank line and this line:

[mds/gris/provider/gridice] 

4. restart globus-mds service

service globus-mds restart 

5. launch the command "crontab -e" and add the following entry:

50 1 * * * $EDG_LOCATION/sbin/edg-fmon-cleanspool &> /dev/null

6. follow the steps for "GridICE node" configuration in order to monitor this node


All other nodes

On allother nodes do the following steps

The configuration files are already there created by configure_node script. Still it is advisable if you have some problems to copy the template configuration files and change them.

1. copy a monitoring template for your Grid node (replace the word PROFILE with 'ce', 'se' or 'wn' accordingly to the role of your node)

cp /opt/gridice/monitoring/etc/edg-fmon-agent.conf-PROFILE.template /opt/edg/var/etc/edg-fmon-agent.conf

NOTE: if you are using PBS or Torque take into account that for the Computing Element configuration you have to replace the word CE_LRMS_SPOOL in /opt/edg/var/etc/edg-fmon-agent.conf with your PBS/Torque spool directory (usually /var/spool/pbs)

2. edit /opt/edg/var/etc/edg-fmon-agent.conf and replace the words "LEMON-COLLECTOR" with the fully qualified domain name of your GridICE collector node configured in the previous section

3. create a symbolic link on a daemon monitoring template for your Grid node (replace the word PROFILE with 'ce-access-node', 'se-access-node','worker-node' accordingly to the role of your node)

ln -s /opt/gridice/monitoring/etc/gridice-role-PROFILE.cfg /opt/gridice/monitoring/etc/gridice-role.cfg

4. start edg-fmon-agent

/etc/rc.d/init.d/edg-fmon-agent start

and configure the start of this daemon at boot

chkconfig --level 345 edg-fmon-agent on

Some tests

Test if the gridice mds is working

ldapsearch -h <gridice-site-collector-node> -p 2136 -b "mds-vo-name=local, o=grid" -x


Test which sites publish to gridice collector node.

ldapsearch -LLL -h <gridice-site-collector-node> -p 2136 -x -b "mds-vo-name=local,o=grid" "objectclass=GlueHost" GlueHostUniqueID


Test if the Gridice MDS connects to local GRIS. If nothing is received try restarting the globus-mds

ldapsearch -h <gridice-site-collector-node> -p 2135 -x -b "mds-vo-name=local,o=grid"

'(&(objectclass=glueservice)(GlueServiceType=gridice))'

Test if the Gridice MDS info is published to the local cluster BDII. Replace <your-registration-name> with the regname stated in the globus.conf for the GIIS. If nothing is received try restarting the globus-mds

ldapsearch -h <site-bdii> -p 2170 -x -b "mds-vo-name=<your-registration-name>,o=grid" '(&(objectclass=glueservice)(GlueServiceType=gridice))'

Finish

If everithing is working please send me the ldap URL to your GIIS service in the form:

ldap://<site-bdii>:<port>/mds-vo-name=<your-registration-name>,o=grid"

In our case we use the following URL:

  • ldap://grid-ce.ii.edu.mk:2170/mds-vo-name=MK02,o=grid

The GridIce Web site for Monitoring the SEE-GRID is

IMPORTANT!!! Please enable firewall connectivity to your BDII site (CE) on ports 2135 and 2170, and on the SE (Gridice) site on ports 2135 and 2136 from the server grid-se.ii.edu.mk.

Please be patient. Our link is very slow (hopefully not for long)

Boro.

Personal tools