SG Helpdesk tickets
From EGEE-see WIki
Ticket templates
When a site repeatedly fails on some of SEE-GRID SAM tests, or does not have OK status in SEE-GRID GStat, Grid-Operator-On-Duty (GOOD) should open a ticket to the site in SEE-GRID Helpdesk. The template for tickets is given below.
SAM Tickets Templates
Dear Site Admins,
<SAM test name> is failing on <node name>
(site: <site name>)
*DETAILS*
Failure detected on: <time stamp>
Test executed for VO: SEEGRID
View failure history and details on SAM portal:
<link to SAM test history>
Could you please have a look and resolve the ticket within a week?
You are kindly requested to put the ticket to 'In Progress' state as soon as you read this.
The following link can be useful to resolve the problem:
<link to the appropriate SEE-GRID, GOC Wiki, or some other troubleshooting page>
Thank you,
<your name>
If site does not respond within three working days, i.e. the ticket is not put in 'In Progress' state within tree working days, or the problem is not resolved within a week, GOOD should update the ticket to remind ticket supporter. In order to do this, GOOD should update the ticket using the template given below.
Dear Site Admins,
You have been contacted for a problem concerning site <site name>, whose node <node name> fails on SEE-GRID SAM test <SAM test name>, as explained in the text of this ticket.
This problem is not solved within a week or there was no response from Site Admins.
Could you please have a look at it and resolve it as soon as possible?
The following link can be useful to resolve the problem:
<link to the appropriate SEE-GRID, GOC Wiki, or some other troubleshoting page>
Thank you,
<your name>
If site does not respond on this escalation for next three working days, or the problem is not resolved within a next week, GOOD should escalate the ticket, and assign it to the country WP3 representative (GIM), asking him/her to provide support to site admins and ensure the ticket is resolved in a timely manner. In order to do this, GOOD should update the ticket using the template given below.
Dear GIM and Site Admins,
You have been contacted for a problem concerning site <site name>, whose node <node name> fails on SEE-GRID SAM test <SAM test name>, as explained in the text of this ticket.
This problem is not solved within a week and is escalated to the GIM level.
Could you please have a look at it and resolve it as soon as possible? GIM can provide support to site admins, and if additional help is needed you can involve other SEE-GRID experts as well.
The following link can be useful to resolve the problem:
<link to the appropriate SEE-GRID, GOC Wiki, or some other troubleshoting page>
Thank you,
<your name>
Gstat Tickets Templates
Dear Site Admins,
GStat displays <WARN, ERROR> status for your site due to the folowing problem:
(site: <site name>)
*DETAILS*
Failure detected on: <time stamp>
View failure history and details on GStat:
<link to site Gstat page>
Could you please have a look and resolve the ticket within a week?
You are kindly requested to put the ticket to 'In Progress' state as soon as you read this.
The following link can be useful to resolve the problem:
<link to the appropriate SEE-GRID, GOC Wiki, or some other troubleshooting page>
Thank you,
<your name>
If site does not respond within three working days, i.e. the ticket is not put in 'In Progress' state within tree working days, or the problem is not resolved within a week, GOOD should update the ticket to remind ticket supporter. In order to do this, GOOD should update the ticket using the template given below.
Dear Site Admins,
You have been contacted for a problem concerning site <site name> that fails on Gstat page, as explained in the text of this ticket.
This problem is not solved within a week or there was no response from Site Admins.
Could you please have a look at it and resolve it as soon as possible?
The following link can be useful to resolve the problem:
<link to the appropriate SEE-GRID, GOC Wiki, or some other troubleshoting page>
Thank you,
<your name>
If site does not respond on this escalation for next three working days, or the problem is not resolved within a next week, GOOD should escalate the ticket, and assign it to the country WP3 representative (GIM), asking him/her to provide support to site admins and ensure the ticket is resolved in a timely manner. In order to do this, GOOD should update the ticket using the template given below.
Dear GIM and Site Admins,
You have been contacted for a problem concerning site <site name>, that fails on Gstat page, as explained in the text of this ticket.
This problem is not solved within a week and is escalated to the GIM level.
Could you please have a look at it and resolve it as soon as possible? GIM can provide support to site admins, and if additional help is needed you can involve other SEE-GRID experts as well.
The following link can be useful to resolve the problem:
<link to the appropriate SEE-GRID, GOC Wiki, or some other troubleshoting page>
Thank you,
<your name>
MPI Tickets Templates
Dear Site Admins,
Your site <site name> have problem with MPI configuration.
*DETAILS*
Failure detected on: <time stamp>
Reason of failure: <copy appropriate error message from job logging info>
Could you please have a look and resolve the ticket within a week?
You are kindly requested to put the ticket to 'In Progress' state as soon as you read this.
The following links can be useful to resolve the problem:
http://wiki.egee-see.org/index.php/SG_MPI_Guide
http://wiki.egee-see.org/index.php/Testing_MPI_support
Thank you,
<your name>
If site does not respond within three working days, i.e. the ticket is not put in 'In Progress' state within tree working days, or the problem is not resolved within a week, GOOD should update the ticket to remind ticket supporter. In order to do this, GOOD should update the ticket using the template given below.
Dear Site Admins,
You have been contacted for a problem concerning site <site name> that fails to execute MPI jobs, as explained in the text of this ticket.
This problem is not solved within a week or there was no response from Site Admins.
Could you please have a look at it and resolve it as soon as possible?
The following links can be useful to resolve the problem:
http://wiki.egee-see.org/index.php/SG_MPI_Guide
http://wiki.egee-see.org/index.php/Testing_MPI_support
Thank you,
<your name>
If site does not respond on this escalation for next three working days, or the problem is not resolved within a next week, GOOD should escalate the ticket, and assign it to the country WP3 representative (GIM), asking him/her to provide support to site admins and ensure the ticket is resolved in a timely manner. In order to do this, GOOD should update the ticket using the template given below.
Dear GIM and Site Admins,
You have been contacted for a problem concerning site <site name>, that fails to execute MPI jobs, as explained in the text of this ticket.
This problem is not solved within a week and is escalated to the GIM level.
Could you please have a look at it and resolve it as soon as possible? GIM can provide support to site admins, and if additional help is needed you can involve other SEE-GRID experts as well.
The following links can be useful to resolve the problem:
http://wiki.egee-see.org/index.php/SG_MPI_Guide
http://wiki.egee-see.org/index.php/Testing_MPI_support
Thank you,
<your name>
Possible values of SAM tests
- CE (Computing Element)
CE-host-cert-valid [cert]CE-sft-apel [apel]CE-sft-brokerinfo [bi]CE-sft-caver [ca]CE-sft-crl [crl]CE-sft-csh [csh]CE-sft-job [js]CE-sft-lcg-rm [rm]CE-sft-lcg-rm-gfal [gfal]CE-sft-lcg-rm-cr [cr]CE-sft-lcg-rm-cp [cp]CE-sft-lcg-rm-rep [rep]CE-sft-lcg-rm-del [de]
CE-sft-rgma [rgma]CE-sft-rgma-sc [rgmasc]CE-sft-softver [ver]CE-sft-vo-swdir [swdir]CE-sft-vo-tag [votag]CE-sft-wn [wn]
- gCE (gLite Computing Element)
gCE-host-cert-valid [cert]gCE-sft-apel [apel]gCE-sft-brokerinfo [bi]gCE-sft-caver [ca]gCE-sft-csh [csh]gCE-sft-job [js]gCE-sft-lcg-rm [rm]gCE-sft-lcg-rm-gfal [gfal]gCE-sft-lcg-rm-cr [cr]gCE-sft-lcg-rm-cp [cp]gCE-sft-lcg-rm-del [del]gCE-sft-lcg-rm-rep [rep]
gCE-sft-rgma [rgma]gCE-sft-rgma-sc [rgmasc]gCE-sft-softver [ver]gCE-sft-vo-swdir [swdir]gCE-sft-vo-tag [votag]gCE-sft-crl [crl]gCE-sft-wn [wn]
- SE (Storage Element)
SE-host-cert-valid [cert]SE-lcg-cr [cr]SE-lcg-cp [cp]SE-lcg-del [del]
- LFC (Global LFC)
LFC-writefile [lfcwf]LFC-ls [lfcls]LFC-host-cert-valid [cert]
- SRM (SRM)
SRM-host-cert-valid [cert]SRM-get [get]SRM-advisory-delete [del]SRM-put [put]
Usual problems and links to (possible) solutions
- BDII
- siteBDII (GIIS) or top-level BDII is Unreachable
http://faq.twgrid.org/faq/index.php?action=artikel&cat=14&id=11&artlang=en - No info published
http://goc.grid.sinica.edu.tw/gocwiki/No_data_published_by_top_level_BDII
- siteBDII (GIIS) or top-level BDII is Unreachable
- CA
- CA version test failed with error message:
This CA is an old one and time allowed to upgrade is over
http://grid-deployment.web.cern.ch/grid-deployment/lcg2CAlist.html
- CA version test failed with error message:
- CE (Computing Element)
- Job submission failed, with error message:
Got a job held event, reason: Globus error 3: an I/O operation failed Job got an error while in the CondorG queue
http://goc.grid.sinica.edu.tw/gocwiki/Job_got_an_error_while_in_the_CondorG_queue - Job submission failed with error message:
Brokerhelper: Cannot plan. No compatible resources:
http://goc.grid.sinica.edu.tw/gocwiki/Brokerhelper%3A_Cannot_plan._No_compatible_resources - Job submission failed with error message:
Got a job held event, reason: Unspecified gridmanager error
http://goc.grid.sinica.edu.tw/gocwiki/Unspecified_gridmanager_error - Job submission failed with error message:
Cannot read JobWrapper output, both from Condor and from Maradona
http://goc.grid.sinica.edu.tw/gocwiki/Cannot_read_JobWrapper_output%2e%2e%2e - Job submission failed with error message:
7 authentication failed
http://goc.grid.sinica.edu.tw/gocwiki/7_authentication_failed - Job submission failed with error message:
10 data transfer to the server failed
http://goc.grid.sinica.edu.tw/gocwiki/10_data_transfer_to_the_server_failed - 4444 Waiting jobs in the GRIS
http://goc.grid.sinica.edu.tw/gocwiki/4444_Waiting_jobs_in_the_GRIS
- Job submission failed, with error message:
- SE (Storage Element)
- File copy and registration failed with error message:
535 535-FTPD GSSAPI error: GSS Major Status: General failure
http://goc.grid.sinica.edu.tw/gocwiki/535_535-FTPD_GSSAPI_error%3A_GSS_Major_Status%3A_General_failure
- File copy and registration failed with error message:
- MPI
- The following guides could be useful for solving MPI-related problems:
http://wiki.egee-see.org/index.php/SG_MPI_Guide
http://wiki.egee-see.org/index.php/Testing_MPI_support
http://goc.grid.sinica.edu.tw/gocwiki/MPI_Support_with_Torque
- The following guides could be useful for solving MPI-related problems:
