SG GOOD

From EGEE-see WIki

Jump to: navigation, search

Recognizing that improvements in the quality and shaping-up of the SEE-GRID infrastructure is an important and ongoing effort, necessary for the successful work of SEE-GRID application developers, as well as for the usage of our infrastructure by the existing user community, the pro-active monitoring of SEE-GRID sites is organized in rotating shifts taken by WP3 country representatives (GIMs). During the shift, GIM is designated as Grid-Operator-On-Duty (GOOD).

Basically, the idea is that each GIM (i.e. GIM team from one country) is on shift during one week, and opens tickets in SEE-GRID Helpdesk to sites from all countries which are failing SAM tests, have other problems identified by GStat or in some other way. Of course, all GIMs are expected to continually monitor and provide support to sites from their countries - this is their day-to-day duty, not related to GOOD shifts.

Details of the organization of GOOD shifts are given below, and will be updated according to the available information.



1) Shifts are taken according to the following rotating plan:

Country Shift starts on
Switzerland 2007-02-12
Serbia 2007-02-19
Croatia 2007-02-26
Greece 2007-03-05
Bulgaria 2007-03-12
Romania 2007-03-19
Turkey 2007-03-26
FYR of Macedonia 2007-04-16
Bosnia-Herzegovina 2007-04-30
Montenegro 2007-05-07
Moldova 2007-05-21
Hungary 2007-05-28
Albania 2007-06-04
Country Shift starts on
Serbia 2007-06-11
Switzerland 2007-06-18
Croatia 2007-06-25
Greece 2007-07-02
Bulgaria 2007-07-09
Romania 2007-07-16
Turkey 2007-07-23
FYR of Macedonia 2007-07-30
Serbia 2007-08-06
Montenegro 2007-08-13
Moldova 2007-08-20
Hungary 2007-08-27
Albania 2007-09-03
Country Shift starts on
Serbia 2007-09-10
Switzerland 2007-09-17
Croatia 2007-09-24
Switzerland 2007-10-01
Bulgaria 2007-10-08
Romania 2007-10-15
Turkey 2007-10-22
FYR of Macedonia 2007-10-29
Bosnia-Herzegovina 2007-11-05
Montenegro 2007-11-12
Moldova 2007-11-19
Hungary 2007-11-26
Albania 2007-12-03
Country Shift starts on
Serbia 2007-12-10
Greece 2007-12-17
Croatia 2007-12-24
Greece 2007-12-31
Bulgaria 2008-01-07
* 2008-01-14
* 2008-01-21
* 2008-01-28
FYR of Macedonia 2008-02-04
Bosnia-Herzegovina 2008-02-11
Montenegro 2008-02-18
Moldova 2008-02-25
Hungary 2008-03-03
Albania 2008-03-10
Country Shift starts on
Serbia 2008-03-17
Switzerland 2008-03-24
Croatia 2008-03-31
Romania 2008-04-07
Bulgaria 2008-04-14
Romania 2008-04-21
Turkey 2008-04-28
FYR of Macedonia 2008-05-05
Bosnia-Herzegovina 2008-05-12
Montenegro 2008-05-19
* 2008-05-26
Moldova 2008-06-02
Hungary 2008-06-09
Albania 2008-06-16
Country Shift starts on
Serbia 2008-06-23
Switzerland 2008-06-30
Croatia 2008-07-07
Bulgaria 2008-07-14
Romania 2008-07-21
Turkey 2008-07-28
FYR of Macedonia 2008-08-04
Bosnia-Herzegovina 2008-08-11
Montenegro 2008-08-18
Moldova 2008-08-25
Hungary 2008-09-01
Albania 2008-09-08
Serbia 2008-09-15
Greece 2008-09-22
Country Shift starts on
Croatia 2008-09-27
Switzerland 2008-10-06
Romania 2008-10-13
Bulgaria 2008-10-20
FYR of Macedonia 2008-10-27
Greece 2008-11-03
Bosnia-Herzegovina 2008-11-10
Montenegro 2008-11-17
Hungary 2008-11-24
Moldova 2008-12-01
Albania 2008-12-08
Georgia 2008-12-15
Turkey 2008-12-22
Armenia 2008-12-27


EGEE countries are put firts, so that the procedure can be polished and refined until new SEE-GRID partners with less experience take their shifts. GIMs will of course be in place of GOODs.



2) GOOD shift tickets and GOOD site tickets

In their work, GOODs will encounter two types of tickets: shift tickets and site tickets.


In order to have written record of each shift and GOOD actions taken during them, a GOOD shift ticket will be created for each GOOD shift in the SEE-GRID Helpdesk (Task group: gLite, Category: Site availability). Shift ticket should be created by the previous GOOD to the next one at the end of each shift (e.g. when Switzerland finishes its shift, it will create a new shift ticket on 2007-02-19 to Serbia's GIM, Serbia's GOOD will create new shift ticket to Croatia's GIM on 2007-02-26 and so on).


Second type of tickets are GOOD site tickets. They are normally created by GOODs each day to all sites experiencing operational problems (Task group: gLite, Category: Site availability). When a problem with some site is identified (site fails some of SEE-GRID SAM tests, GStat page for site displays problems, jobs submitted to the site by GOOD experience problems etc.), GOOD will open a new ticket and assign it to the site. Page with templates for tickets and links to useful Wiki pages with troubleshooting information is available here:

http://wiki.egee-see.org/index.php/SG_Helpdesk_tickets

GOODs are expected to update this Wiki page with new templates for various identified problems, and to add links to useful troubleshooting pages.

On the request of applications that need MPI support on sites, GOODs are expected to test MPI setup on all SEE-GRID sites that claim to support it. The MPI setup test should be performed at least once a week, and GOODs should ensure that the test parallel job runs at the same time on at least two WNs (to test ssh setup as well). On sites with SMPSIZE=4 this would require 5 processes in MPI; on sites with single-core WNs it will be enough to have just 2 processes; therefore, GOOD is responsible for sending the appropriate test job to each site. Since such test jobs require several CPUs, it is likely that their execution will take more time than it is usual for a single-CPU jobs. For this reason, such test jobs should be given enough time to complete (1-2 days). In case the test jobs fail (or do not run to completion in a reasonable time), GOOD site ticket should be created. More details can be found on the Wiki page on Testing MPI support.


3) Hand-over report

When creating a shift ticket to the next GOOD, previous GOOD should enter brief hand-over report, i.e. list of major problems that remain to be solved, observations about some hard cases, number of newly created GOOD site tickets during the last week, overall number of GOOD site tickets still open, number of GOOD site tickets closed last week, etc. (2 paragraphs at most usually).



4) GOOD shift ticket updates

During the shift, GOOD will update the shift ticket with all relevant information about newly created site tickets, updates on operational documents on the Wiki etc. The ticket should be closed when the shift is finished, and new ticket opened to the next GOOD, containing hand-over report.

Personal tools