SG BBmSAM

From EGEE-see WIki

Jump to: navigation, search

Contents

Overview

Availability monitoring of the infrastructure is carried out by using SAM (Service Availability Monitoring) system developed in EGEE (Enabling Grids for E-science in Europe) project. SAM consists of server and client components that communicate over web services. The client initiates periodical test of infrastructure and published data to server which stores them in database.

As the original SAM used in EGEE is based on Oracle Database (which is commercial product), the decision was made to develop alternative solution that would not be based on any commercial products and would be better suited to the needs of the SEE-GRID projects.

Following image represents architectural overview of BBmSAM implementation used in SEE-GRID-SCI project:

Image:BBmSAM_en.png

BBmSAM Platform is web application coded in PHP using MySQL Database as data storage (although any standard SQL database server could be used as we do not take advantage of any MySQL specific features). It has been tested under Apache httpd, Microsoft IIS web server and should work with any web server supporting PHP at least through CGI.

Main features of BBmSAM system are:

  • Use of unaltered client and sensor components of EGEE SAM system
  • Synchronization with central HGSM (Hierarchical Grid Site Management) service - this service completely replaces GOCDB and eliminates the need of importing additional information from BDIIs and other sources
  • Use of free and open source technologies
  • Use of as few as possible different technologies to ease maintenance and development
  • Enabling more efficient access by mobile and small-screen devices

BBmSAM Server is currently installed as a part of BA-01-ETFBL site at Faculty of Electrical Engineering Banja Luka.

BBmSAM Components

BBmSAM system consists of a number of components that will be described in this section.

Database Server

As stated earlier, BBmSAM uses MySQL as database server instead of Oracle. In implementing DB access we have tried to stay clear of proprietary and non-standard extensions and even more common features as stored procedures and triggers in order to provide as much transparency as possible and to enable easier migration to different database server.

Database schema was kept as close as possible to the original SAM DB with changes in places where they were mandatory (mainly because of differences between HGSM and GOCDB/GridView). Two new tables were introduced (one for site downtime data and other for uptime calculations) but they can be easily spotted as their names end with _bbm.

Synchronization Service

This service was implemented as a mean to synchronize SAM database with remote HGSM server. In the first version of BBmSAM, there was a need for a local copy of HGSM DB that would be exported to SAM DB. Currently, there is no need for it as we can synchronize directly from HGSM export to SAM DB (thanks to additional tables in SAM DB). This also makes possible to use different data sources (GOCDB/others) as long as there is a way to generate proper export.

Synchronization process has three steps: 1. Generating XML export on HGSM server 2. Importing and transforming XML data to SAM DB specifications 3. Generating HGSM "snapshot history" on local server (optional)

BBmSAM Web Services

Web services are implemented in PHP with the use of NuSOAP library and mimic original SAM WS component to enable use of standard clients. The two implemented web services are query and publish - first for querying and filtering data needed to run tests and second for publishing the test results to central database.

BBmSAM Portal

This component enables simple and efficient access to all data stored in BBmSAM database including current and historical data.

It is described in detail in SG_BBmSAM_Portal.

BBmobileSAM

This is a specialized mini-portal for devices with small screens and no support for full HTML. It consists of only the basic information (with three different levels of details and color-coding the results) and uses very small subset of HTML that enables it to be used on almost any mobile device. It is also optimized to produce small/compact output that limit the cost and time of download when using slow and relatively expensive connections (e.g. GPRS).

Sample screen-shot:

Image:BBmobileSAM_sample.png

If the mobile device browser is capable of full HTML support (iPhone, s60 browser, Opera Mini v4.2, etc) one can use normal version of BBmSAM Portal but with customized front page suitable for 320 pixels screen width.

BBmSAM Extensions

As the focus of project was shifted from infrastructure to applications and end-users, there was a need for a method to enable better and easier connectivity between BBmSAM and third party tools, services and applications. Some of the extensions are use of BBmSAM by Nagios portal, FCR (Freedom of Choice) extension, universal XML export, 'uptimeWS web service that enables filtering sites/services by their status and uptime, SAMDB XSQL compatible data exports for service and serviceinstance statuses. This part of the system is under constant development and open to end user suggestions.

BBmSAM Extensions are documented as recommended by JRA1 in following PPT presentation: Media:SEEGRIDSCI-JRA1-BBmSAMeX.ppt as well as the wiki page located at SG_BBmSAMeX

BBmSAM Admin

This component is currently under development.

Functioning

Process of BBmSAM system functioning consists of:

  • Periodical synchronization of local HGSM database with central HGSM database – performed every 10 minutes (before any submit or publish operation) - this can now be replaced by direct synchronization of SAM database with HGSM database export with no need for local HGSM database (although having a local HGSM copy can provide additional information about sites/nodes/services/etc.)
  • Regular SAM test submission performed every 3 hours for interactive tests (job based) and every hour for non-interactive tests
  • Publishing of interactive test data every 20 minutes
  • Calculating hourly uptime/availability every hour (for SEE-GRID-2 compatible SLA)
  • Calculating service instance uptime (for continuous time SLA calculations in SEE-GRID-SCI)
  • Generating information for end-users of portal on on-demand basis

SAM client and sensors are official client and sensors used in standard EGEE SAM distribution and they operate in identical way as they do there. In designing BBmSAM portal and dependant web services, special care was taken so that the solution would be as compatible as possible with EGEE tools and practices. This was achieved by implementing same web services used in standard Oracle/Java based SAM in PHP/MySQL combination.

Every monitored service is tested by a sensor which consists of individual tests. Every performed test returns a status ID which defines the outcome of the test (e.g. ok/warning/error). All tests are not created equal as there are some that are required to be fulfilled in order for the service provided by tested node to be functional in defined manner. These tests are denominated as critical. Their outcome is logically combined and the status of complete test set is the maximum value of all critical tests. Highest priority that overrides any other is the MAINT status that denotes a site or service that is in declared downtime (maintenance).

Personal tools