Ops doc

From EGEE-see WIki

Jump to: navigation, search

Contents

Administration


Operational Procedures [1]

This document does not describe future procedures. The purpose is to document the procedures

used currently to operate the LCG2/EGEE production service and to describe the tools that are

in use to implement the procedures.


Security Incident Handling and Response Guide [2]

This document is intended for Grid site security contacts and site administrators. It is

expected that this policy document will be supplemented by additional information concerning

Incident Response procedures published on project websites.


OAG Procedures and Policy Report [3]

This document describes mandate, composition, procedures, and tools of the Operations Advisory

Group (OAG) of EGEE, either planned or in place. The OAG is a coordination group between the

activities NA4 (Applications) and SA1 (Operations). This could be important for VO managers,

especially of new VOs, and also for site administrators as well as ROC managers.


VOMS Admin [4]

The VOMS Admin service is a web application providing tools for administering member databases for

VOMS, the Virtual Organization Membership Service. VOMS serves as a central repository for user

authorization information, providing support for sorting users into a general group hierarchy, keeping

track of their roles, etc. Its functionality may be compared to that of a Kerberos KDC server.

It provides an intuitive web user interface for daily administration tasks, and a SOAP interface

for remote clients. (The entire functionality of the VOMS Admin service is accessible via the

SOAP interface.) The Admin package includes a simple command-line SOAP client that is useful for

automating frequently occuring batch operations, or simply to serve as an alternative to the full-blown

web interface. It is also useful for bootstrapping the service.


EGEE SUPPORT - Process and Workflow Documentation [5]


Integrate a new Virtual Organization [6]

This document proposes a procedure for both acceptance and deployment of a new VO to the EGEE infrastructure.

It contains the organizational and technical aspects of these procedures.The target audience of this document

is the SA1 people, in particular, the representatives of RCs, ROCs, CICs and OMC.


Virtual Organization Registration Procedure [7]

This document lists the necessary steps a Virtual Organisation (VO) should take in order to get

registered with and integrated into the EGEE infrastructure.


Virtual Organization Security Policy [8]

This policy defines a set of responsibilities placed on the members of the VO and the VO as a whole through its

managers. It aims to ensure that all Grid participants have sufficient information to properly fulfil their roles

with respect to interactions with a Virtual Organisation (VO). This policy does not address the

process by which disputes between Grid participants are resolved. It is expected that VO and Grid management bodies

will agree appropriate mechanisms through which such disputes can be resolved.


Useful Links


CIC Operations Portal [9]


Grid Log Retention Guidelines [10]

The current minimal Log Retention policy states that log information must be kept for at least 90 days. This page intends

to details a sample implementation of this policy. Job submission will normally progress from a User Interface (UI) machine,

through a Resource Broker (RB) to a Computing Element (CE) and hence to the compute resource (usually a batch system). In some

cases the RB is not used and the UI submits the job directly to the CE. Data access is through a Storage Element (SE) service

and may be initiated directly from the UI or from a task executed on the compute resource.


Sources of trace information for the LCG CE [11]


Security Service Challenge level 1 (SSC_1) [12]

Security Service Challenge level 1 (SSC_1) challenges the Workload Management System (WMS) on the Grid: Resource Broker (RB) and

Compute Element (CE). The goal of the LCG/EGEE Security Service Challenge (SSC), is to investigate whether sufficient information

is available to be able conduct an audit trace as part of an incident response, and to ensure that appropriate communications channels

are available.



Certification


Grid Certification Guide [13]


LCG


CVS User Guide [14]

Help Using of CVS areas in LCG. This documention must be read by all LCG deployment group and by develloper working on LCG devellopement.


gLite3 User Guide [15]

This document gives an overview of the gLite 3.0 middleware. It helps users to understand the building

blocks of the Grid and the available interfaces to the Grid services in order to run jobs and manage data.

This document is neither an administration nor a developer guide. It is addressed to WLCG/EGEE users and

site administrators who would like to work with the gLite middleware.


LCG-2-UserGuide [16]

This document gives an overview of the main characteristics of the LCG-2 middleware, which is being

used for EGEE. It allows users to understand the building blocks and the available interfaces to the GRID

tools in order to run jobs and manage data. This document is neither an administration nor a developer guide.

It is addressed to users and site administrators of EGEE who would like to work with the LCG-2 Grid middleware.


LCG2-Manual-Install [17]

This document is addressed to Site Administrators in charge of middleware installation and configuration.

It is a generic guide to manual installation and configuration for any supported node types. It provides a

fast method to install and configure the gLite middleware on the various node types (WN,UI, CE, SE ...).


LCG2-Manual-Upgrade [18]

This document is addressed to Site Administrators in charge of middleware installation and configuration.

It is a generic guide to the manual upgrade procedure for the various node types (WN, UI, CE, SE etc.)on SLC3

and binary compatible OSes. It refers to the upgrade between the latest middleware release and the previous one.


LCG2-Site-Setup [19]

This document describes the process of setting up and registering a grid site using themiddleware packaged by LCG.

This middleware represents the current middleware stack used in the LCG-2 and EGEE production grid. This information

is relevant for site managers or sysadmins that want to setup a EGEE/LCG-2 production site or upgrade their site to the

latest release.


LCG2-Site-Testing [20]

This is a collection of basic commands that can be run to test the correct setup of a site.These tests are not meant to

be a replacement of the test tools provided by LCG certification team. They are instad a collection of quick and non invasive

functional tests suitable to be run in order to be sure that the site configuration has been correctly performed. The tests in

this chapter should enable the site administrator to verify the basic functionality of the site.


LCG-Midleware-developers-guide [21]

This document is a guide for anyone developing or modifying code for LCG. This guide is directly derived from the European Datagrid

Developer’s Guide. The LCG version differs to suit the requirements of LCG and is more concise. The main objective of this guide is to define

the procedures used by LCG for software development to ensure that the software produced meets quality required for a production system. This

guide focuses on the basics in order for it to be easily followed, flexible and applicable to other projects. This guide should also be an example

to anyone producing software for LCG demonstrates what is expected in matters of quality.


LFC-Administrator-Guide [22]

The LCG File Catalog (LFC) is a high performance catalog provided by LCG. This document describes the LFC architecture and implementation.

It also explains how to install the LFC client as well as the LFC server (version 1.3.2) for both MySQL and Oracle backend.


Experiment Software Installation in LCG-2 [23]

About the installation of experiments software on LCG-2 sites.


Maui-Cookbook for LCG [24]

This document introduces the Maui advanced job scheduler in the context of LCG. It also shows how Maui is being used at two sites

having different approaches: the English Rutherford Appleton Laboratory (RAL) and the Dutch National Institute for Nuclear and

High Energy Physics (NIKHEF).


R-GMA Server User Guide [25]

The R-GMA server is a Java servlet-based web application which provides the Consumer, Producer,Registry and Schema services for the

R-GMA distributed information and monitoring system. The server is designed to be run within a servlet container such as Jakarta Tomcat.

Tomcat versions 4 and 5 have been tested, however other versions or other servlet containers may also work. This document describes the

servlet-based implementation of the R-GMA server.


R-GMA Command Line Tool [26]

The R-GMA command line tool provides simple shell-like access to the R-GMA distributed information and monitoring system. R-GMA uses a

relational model to publish and query information using the SQL language. This document describes the R-GMA command line tool.


Useful Links

gLite 3.0 release updates [27]


Support


Grid tutorial [28]

This document leads you through a number of increasingly sophisticated exercises covering aspects of job submission, data management and

information systems. It is assumed that you are familiar with the basic Linux/UNIX user environment (bash, shell etc.) and that you have

obtained a security certificate providing access to the LCG-2 testbed. This document is designed to be accompanied by a series of

presentations providing a general overview of Grids and the LCG tools. Solutions to all the exercises are available online. We do not give exact

host names of machines in the testbed since they change over time.


GGUS Support Model [29]

This document is an outline description of how the GGUS ticketing system behaves.


Infrastructure Planning Guide [30]

This document is a summary of the experience and knowledge gained during the building of the EGEE grid infrastructure. The document is intended

to explain some of the decisions and choices made in planning,deploying, and operating the infrastructure, and should be helpful to others who

consider building grid infrastructures or participating in existing grids. It is not intended to be definitive, but rather to explain the issues

and the experience with the hope that others can benefit.


FAQ_for_ROCs [31]


FAQ_for_CE ROC Central [32]


FAQ_for_Italy ROC Italy [33]


FAQ_for_NE ROC North [34]


FAQ_for_SE ROC South East [35]


FAQ_for_SW ROC South West [36]


FAQ_for_UK ROC UK [37]


GGUS Presentation [38]


Operations of ENOC [39]

Description of ENOC EGEE Network Operations Centre.


Tutorial on GGUS HelpDesk System [40]


Tools


Monitoring and Alarm Systems [41]

This document is an initial assessment of the current CIC activity in monitoring and covers five monitoring tools:

GPPMON, GRIDICE, GSTAT, NAGIOS and Real Time Grid Monitor.


Inventory of Operation Tools, Procedures and Gap Analysis [42]

This document contains the inventory of operations’ tools, procedures, and gap analysis.


Goc_db details [43]


Web Site


Accounting and Reporting Web Site Publicly Available [44]

This document describes the “Accounting and Reporting Web Site�? for deliverable DSA1.3 for SA1 Operations Activity. This is a software

deliverable, not a document deliverable. This deliverable provides a brief description of the software processes set in place for the

collection of accounting data and for the presentation of this data by means of a publicly accessible web page.


Adding a user in the access list [45]

This document provides information on how to add a user in the author list in the documentation system.


Changing the contents of the VO table in the CIC portal [46]

This document provides information on how to change the content of the VO tables in the CIC portal.


Creating a private web space for a user [47]

This document provides information on how to create a private web space for a user in the documentation system.


Locally changing a file [48]

This document provides information on how to change the contents of a file in the documentation system.


Logging on with WebDAV [49]

This document provides information on how to logon to the documentation system with WebDAV using Internet Explorer on the Windows platform.


Remotely changing a file [50]

This document provides information on how to change the contents of a file in the documentation system.


SA1 Documentation System [51]

This document provides information on how to browse the contents of a file in the documentation system.


Per ROC

AsiaPacific

CentralEurope

CE Operations Procedures [52]

Joining CE ROC as a new RC (new site certification) [53]

Operating Grid Core Services [54]

Support for site admins [55]

CE wiki [56]

CERN

France

GermanySwitzerland

Italy

NorthernEurope

Russia

SouthEasternEurope

SEE CE Operations Procedures

Administrators Wiki

Procedures for migrating to gLite 3.0

SouthWesternEurope

UKI

US

Outdated

Configuration of Virtual Organizations [57]

This document provides a detailed description of the so called VO management feature implemented

in the gLite configuration system. It provides the implementation description, use-cases and

extensive examples for any gLite middleware administrator doing modifications in the default VO

management configuration provided in the release. It contains VO management implementation details

useful for understanding of the VO management functionality and also hints and examples for advanced

and expert configuration. Document assumes good knowledge of the gLite configuration model

(configuration procedure, schema, etc.) and basic knowledge of XML.

Personal tools