SG GridIce Guide
From EGEE-see WIki
1. As you already know the GridIce installation (sensors and server) are included in the LCG-2 release. Depending on the release you will need to upgrade some RPM-s.
The new RPM-s for the monitoring nodes (CE, SE and WN (still haven't tried to connect the WN)) are:
- gridice-sensor (1.5.1)
- edg-fabricMonitoring (2.5.4-5)
that can be found at
- http://infnforge.cnaf.infn.it/download.php/200/edg-fabricMonitoring_gcc3_2_2-2.5.4-5_sl3.i386.rpm
- http://infnforge.cnaf.infn.it/download.php/198/edt_sensor-1.5.1-sl3.i386.rpm
Upgrade the packages using
rpm -U edg-fabricMonitoring_gcc3_2_2-2.5.4-5_sl3.i386.rpm rpm -U edt_sensor-1.5.1-sl3.i386.rpm
2. update YAIM SCRIPT
in your site-info.def, be sure that the following variables are defined:
# collector for the GridICE monitoring data # it must be a machine where an MDS is installed # and which GRIS or GIIS is registered to the BDII # typical choice: $SE_HOST GRIDICE_SERVER_HOST=$SE_HOST # in case of PBS/Torque bacth system, be sure that the # accounting directory is accessible by the CE machine # and configure the folowing variable: # - for torque the path is typically /usr/spool/torque # - for pbs the path is typically /var/spool/pbs CE_LRMS_SPOOL=/var/spool/pbs
3. update the /opt/lcg/yaim/functions/config_fmon_client with the following one:
reconfigure your node: (Example)
/opt/lcg/yaim/scripts/configure_node site-info.def CE_torque
Here follows the older configuration for previous version of LCG but some parts still apply. The last part is very IMPORTANT (see testing at the end)
Contents |
site collector node
You should choose a GridIce site collector node, which is usually the SE.
Then undergo the following instructions (some are already done by the configure_node script but still): 1. start edg-fmon-server (collector service)
/etc/rc.d/init.d/edg-fmon-server start
and configure the start of this daemon at boot
chkconfig --level 345 edg-fmon-server on
2. start gridice-mds (publisher service)
/etc/rc.d/init.d/gridice-mds start
and configure the start of this daemon at boot
chkconfig --level 345 gridice-mds on
3. (IMPORTANT !!! also when upgrading - the configure_node script misses this) edit /etc/globus.conf and after the line:
[mds/gris/provider/edg]
add a blank line and this line:
[mds/gris/provider/gridice]
4. restart globus-mds service
service globus-mds restart
5. launch the command "crontab -e" and add the following entry:
50 1 * * * $EDG_LOCATION/sbin/edg-fmon-cleanspool &> /dev/null
6. follow the steps for "GridICE node" configuration in order to monitor this node
All other nodes
On allother nodes do the following steps
The configuration files are already there created by configure_node script. Still it is advisable if you have some problems to copy the template configuration files and change them.
1. copy a monitoring template for your Grid node (replace the word PROFILE with 'ce', 'se' or 'wn' accordingly to the role of your node)
cp /opt/gridice/monitoring/etc/edg-fmon-agent.conf-PROFILE.template /opt/edg/var/etc/edg-fmon-agent.conf
NOTE: if you are using PBS or Torque take into account that for the Computing Element configuration you have to replace the word CE_LRMS_SPOOL in /opt/edg/var/etc/edg-fmon-agent.conf with your PBS/Torque spool directory (usually /var/spool/pbs)
2. edit /opt/edg/var/etc/edg-fmon-agent.conf and replace the words "LEMON-COLLECTOR" with the fully qualified domain name of your GridICE collector node configured in the previous section
3. create a symbolic link on a daemon monitoring template for your Grid node (replace the word PROFILE with 'ce-access-node', 'se-access-node','worker-node' accordingly to the role of your node)
ln -s /opt/gridice/monitoring/etc/gridice-role-PROFILE.cfg /opt/gridice/monitoring/etc/gridice-role.cfg
4. start edg-fmon-agent
/etc/rc.d/init.d/edg-fmon-agent start
and configure the start of this daemon at boot
chkconfig --level 345 edg-fmon-agent on
Some tests
Test if the gridice mds is working
ldapsearch -h <gridice-site-collector-node> -p 2136 -b "mds-vo-name=local, o=grid" -x
Test which sites publish to gridice collector node.
ldapsearch -LLL -h <gridice-site-collector-node> -p 2136 -x -b "mds-vo-name=local,o=grid" "objectclass=GlueHost" GlueHostUniqueID
Test if the Gridice MDS connects to local GRIS. If nothing is received
try restarting the globus-mds
ldapsearch -h <gridice-site-collector-node> -p 2135 -x -b "mds-vo-name=local,o=grid"
'(&(objectclass=glueservice)(GlueServiceType=gridice))'
Test if the Gridice MDS info is published to the local cluster BDII. Replace <your-registration-name> with the regname stated in the globus.conf for the GIIS. If nothing is received try restarting the globus-mds
ldapsearch -h <site-bdii> -p 2170 -x -b "mds-vo-name=<your-registration-name>,o=grid" '(&(objectclass=glueservice)(GlueServiceType=gridice))'
Finish
If everithing is working please send me the ldap URL to your GIIS service in the form:
ldap://<site-bdii>:<port>/mds-vo-name=<your-registration-name>,o=grid"
In our case we use the following URL:
- ldap://grid-ce.ii.edu.mk:2170/mds-vo-name=MK02,o=grid
The GridIce Web site for Monitoring the SEE-GRID is
IMPORTANT!!! Please enable firewall connectivity to your BDII site (CE) on ports 2135 and 2170, and on the SE (Gridice) site on ports 2135 and 2136 from the server grid-se.ii.edu.mk.
Please be patient. Our link is very slow (hopefully not for long)
Boro.
