SG Data Management High Level Tools
From EGEE-see WIki
This guide is a part of SEE-GRID Gridification Guide. It is contributed by Institute of Physics Belgrade and Belgrade University Computer Centre.
Contents |
Data Management
The data management in the grid environment includes management of physical files located on the storage elements such as copying replica from one SE to another (replication), deleting physical files, copying files from user interface local file system to SE and from SE to local filesystem.
The gLite middleware provides data management command line utilities (CLI) as well as data management APIs for several programming languages. There are also some APIs and utilities that are developed by users, desribed at SEE-GRID File Management Java API, Managing Sets of Files and Replicas Within LFC Catalog, and Data Management Web Portal.
gLite offers variety of data management client tools for uploading/downloading files to/from the Grid and replicating data. These tools can be divided into two groups: high level and low level tools.
High level tools provide a high level interface (both command line and APIs) to the basic data management functionality, hiding the complexities of catalog and Storage Elements interaction and minimizing the risk of grid files corruption. It is highly recommended for every user to deal with data management through the LCG Data Management tools, also referred as lcg-utils.
Low level tools (edg-gridftp-*, globus-url-copy, srm-* dedicated commands) can be helpful in some particular cases, but their usage is discouraged for non expert users, since they do not ensure consistency between physical files placed on a storage and entries in the file catalog. Usage of these tools might be dangerous.
High level tools, also known as LCG Data Management tools (lcg-utils), allow users to copy files between UI, CE, WN and a SE, to register entries in the file catalog and replicate files between SEs. Note: a file is considered to be a Grid file if it is both physically present in a SE and registered in the file catalog.
LCG Data Management tools (lcg-utils)
The lcg-utils package is a part of gLite distribution and consists of command line utilities and APIs provided for C/C++ and Python programming language. The lcg-utils offers data management operations and logical files operations (interaction with grid file catalog).
Each CLI utility corresponds to one of the API functions, for example the lcg_cp API function has the functionality of the lcg-cp CLI utility:
CLI API lcg-cp lcg_cp lcg-rep lcg_rep lcg-del lcg_del ... ...
The man pages for LCG_UTILS CLI and API are part of the gLite middleware distribution and are also available online.
Available commands and functions can be divided into two groups:
- Physical file/replica management
lcg-cp Copies a Grid file to a local destination (download) lcg-cr Copies a file to a SE and registers the file in the catalo (upload) lcg-del Deletes one file (either one replica or all replicas) lcg-rep Copies a file from one SE to another SE and registers it in the catalog (replicate) lcg-gt Gets the TURL for a given SURL and transfer protocol lcg-sd Sets file status to ”Done” for a given SURL in an SRM’s request
- Logical file management, i.e. file catalog interaction
lcg-aa Adds an alias in the catalog for a given GUID lcg-ra Removes an alias in the catalog for a given GUID lcg-rf Registers in the catalog a file residing on an SE lcg-uf Unregisters in the the catalog a file residing on an S lcg-la Lists the aliases for a given LFN, GUID or SURL lcg-lg Gets the GUID for a given LFN or SURL lcg-lr Lists the replicas for a given LFN, GUID or SURL
For a more detailed description of logical files read SG Data Management File Names and LFC.
Most of lcg-utils commands and functions accept logical file names, physical file names and GUIDs as arguments. A name is passed with the corresponding prefix: "lfn:", "guid:" or "surl:". The functions that return the integer values return 0 if successful and -1 if an error occurs. The functions that return pointers return NULL when an error occurs.
Some environment variables must or can be set for successful usage of lcg-utils:
LCG_GFAL_INFOSYS: must be set to point to a top BDII in the format <hostname>:<port>; BDII read port is 2170; $ echo $LCG_GFAL_INFOSYS bdii.phy.bg.ac.yu:2170 LFC_HOST: the endpoint for the catalog can be specified (taking precedence over that published in the IS); if no endpoints are specified,the ones published in the Information System are taken; $ echo $LFC_HOST lfc.phy.bg.ac.yu LCG_GFAL_VO: if set indicates the user VO, if not --vo command option is required (taking precedence over LCG_GFAL_VO); $ echo $LCG_GFAL_VO VO_<VO>_DEFAULT_SE: variable specifies the default SE for the VO <VO>; $ echo $VO_AEGIS_DEFAULT_SE dpm.phy.bg.ac.yu
Examples
CLI Usage of lcg-utils
User must have a valid proxy to use lcg-utils:
[neda@ce neda]$ voms-proxy-init -voms aegis Enter GRID pass phrase: Your identity: /C=RS/O=AEGIS/OU=Institute of Physics Belgrade/CN=Neda Svraka Cannot find file or dir: /home/neda/.glite/vomses Creating temporary proxy ..................................................................... Done Contacting voms.phy.bg.ac.yu:15001 [/C=RS/O=AEGIS/OU=Institute of Physics Belgrade/CN=host/voms.phy.bg.ac.yu] "aegis" Done Creating proxy ...................................................... Done Your proxy is valid until Wed Aug 15 22:49:52 2007
Information on the exiting Grid resources (in this case available file catalogs and SEs) can be obtained by using lcg-infosites command:
$ lcg-infosites --vo aegis lfc lfc.phy.bg.ac.yu grid02.rcub.bg.ac.yu
$ lcg-infosites --vo aegis se Avail Space(Kb) Used Space(Kb) Type SEs ---------------------------------------------------------- 1930000 201484 n.a dpm.phy.bg.ac.yu 56003912 90016524 n.a se.phy.bg.ac.yu 72005300 1007708 n.a grid01.rcub.bg.ac.yu 42500000 10580000 n.a grid15.rcub.bg.ac.yu 8494464 1142852 n.a grid01.elfak.ni.ac.yu 54740472 27808596 n.a cluster1.csk.kg.ac.yu 40519604 1768792 n.a rti29.etf.bg.ac.yu
Setting necessary environment variables for communication with LFC catalog (for existing shell):
$ export LFC_HOST=lfc.phy.bg.ac.yu $ export LCG_CATALOG_TYPE=lfc
Uploading a file to the Grid, which means transfering it from local machine to SE and registering it in file catalog:
$ lcg-cr -v --vo aegis -d dpm.phy.bg.ac.yu -l lfn:/grid/aegis/neda/dpmtest.txt file:/home/neda/dpmtest.txt
Using grid catalog type: lfc
Using grid catalog : lfc.phy.bg.ac.yu
Using LFN : /grid/aegis/neda/dpmtest.txt
Using SURL : srm://dpm.phy.bg.ac.yu/dpm/phy.bg.ac.yu//home/aegis/generated/2007-08-15/file71f10a8e-fb0b-41c5-99c7-8e4c55ac78c6
Source URL: file:/home/neda/dpmtest.txt
File size: 106
VO name: aegis
Destination specified: dpm.phy.bg.ac.yu
Destination URL for copy: gsiftp://dpm.phy.bg.ac.yu/dpm.phy.bg.ac.yu:/storage/aegis/2007-08-15/file71f10a8e-fb0b-41c5-99c7-8e4c55ac78c6.4840.0
# streams: 1
# set timeout to 0 seconds
Alias registered in Catalog: lfn:/grid/aegis/neda/dpmtest.txt
106 bytes 0.24 KB/sec avg 0.24 KB/sec inst
Transfer took 1030 ms
Destination URL registered in Catalog: srm://dpm.phy.bg.ac.yu/dpm/phy.bg.ac.yu//home/aegis/generated/2007-08-15/file71f10a8e-fb0b-41c5-99c7-8e4c55ac78c6
guid:cb169e1e-8b3d-4c71-870c-bb02eccaefc0
When -v option is used detailed information about this operation can be seen. Following options were used in this example:
--vo: specifying VO (aegis) -d: destination SE (dpm.phy.bg.ac.yu); if not specified SE specified by VO_<VO>_DEFAULT_SE is taken, which is set in all WNs and UIs. -l: specifying LFN (lfn:/grid/aegis/neda/dpmtest.txt)
The only argument is the local file to be uploaded (a fully qualified URI):
file:/home/neda/dpmtest.txt
The command returns the file GUID:
guid:cb169e1e-8b3d-4c71-870c-bb02eccaefc0
Checking registration of the file in catalog:
$ lfc-ls -l /grid/aegis/neda -rw-rw-r-- 1 105 101 106 Aug 15 13:57 dpmtest.txt -rw-rw-r-- 1 105 101 105 Aug 15 11:40 setest.txt
Replicating a file on a different SE (se.phy.bg.ac.yu):
$ lcg-rep -v --vo aegis -d se.phy.bg.ac.yu guid:cb169e1e-8b3d-4c71-870c-bb02eccaefc0
Using grid catalog type: lfc
Using grid catalog : lfc.phy.bg.ac.yu
Source URL: guid:cb169e1e-8b3d-4c71-870c-bb02eccaefc0
File size: 106
VO name: aegis
Destination specified: se.phy.bg.ac.yu
Source URL for copy: gsiftp://dpm.phy.bg.ac.yu/dpm.phy.bg.ac.yu:/storage/aegis/2007-08-15/file71f10a8e-fb0b-41c5-99c7-8e4c55ac78c6.4840.0
Destination URL for copy: gsiftp://se.phy.bg.ac.yu/storage/aegis/generated/2007-08-15/filecaf3c8e1-dc51-4638-8ab0-092d21448337
# streams: 1
# set timeout to 0
0 bytes 0.00 KB/sec avg 0.00 KB/sec inst
Transfer took 2020 ms
Destination URL registered in LRC: sfn://se.phy.bg.ac.yu/storage/aegis/generated/2007-08-15/filecaf3c8e1-dc51-4638-8ab0-092d21448337
The file to be replicated can be specified using a LFN, GUID or SURL. In this case GUID is used. NOTE: For one GUID, there can be only one replica per SE.
Different information about file and its replica can be obtained. Listing of existing replicas, a LFN, the GUID or a SURL can be used to specify the file (LFN is used as argument):
$ lcg-lr -v --vo aegis lfn:/grid/aegis/neda/dpmtest.txt sfn://se.phy.bg.ac.yu/storage/aegis/generated/2007-08-15/filecaf3c8e1-dc51-4638-8ab0-092d21448337 srm://dpm.phy.bg.ac.yu/dpm/phy.bg.ac.yu//home/aegis/generated/2007-08-15/file71f10a8e-fb0b-41c5-99c7-8e4c55ac78c6
Returning the GUID associated with a specified LFN or SURL:
$ lcg-lg --vo aegis srm://dpm.phy.bg.ac.yu/dpm/phy.bg.ac.yu//home/aegis/generated/2007-08-15/file71f10a8e-fb0b-41c5-99c7-8e4c55ac78c6 guid:cb169e1e-8b3d-4c71-870c-bb02eccaefc0
Listing of the LFNs associated with a particular file, which can be identified by its GUID, any of its LFNs, or the SURL of one of its replicas:
lcg-la --vo aegis guid:cb169e1e-8b3d-4c71-870c-bb02eccaefc0 lfn:/grid/aegis/neda/dpmtest.txt
Obtaining a TURL for a replica (SURL and a supported protocol should be provided):
$ lcg-gt sfn://se.phy.bg.ac.yu/storage/aegis/generated/2007-08-15/filecaf3c8e1-dc51-4638-8ab0-092d21448337 gsiftp gsiftp://se.phy.bg.ac.yu/storage/aegis/generated/2007-08-15/filecaf3c8e1-dc51-4638-8ab0-092d21448337 0 0
$ lcg-gt srm://dpm.phy.bg.ac.yu/dpm/phy.bg.ac.yu//home/aegis/generated/2007-08-15/file71f10a8e-fb0b-41c5-99c7-8e4c55ac78c6 gsiftp gsiftp://dpm.phy.bg.ac.yu/dpm.phy.bg.ac.yu:/storage/aegis/2007-08-15/file71f10a8e-fb0b-41c5-99c7-8e4c55ac78c6.4840.0 4843 0
The command behaves very differently if the Storage Element exposes an SRM interface or not. The command always returns three lines of output: the first is always the TURL of the file, the last two are meaningful only in case of SRM interface. The second and third lines of output represent the requestID and fileID for the srm put request (hidden to the user) which will remain open unless explicitly closed (at least with SRM 1). It is important to know that some SRM SEs are limited in the maximum number of open requests. The request can be closed once the TURL is not needed anymore using lcg-sd needs as arguments the TURL of the file, the requestID and fileID.
The Grid file can be copied from SE to a local machine using a LFN, GUID or SURL of a valid Grid file:
$ lcg-cp -v --vo aegis srm://dpm.phy.bg.ac.yu/dpm/phy.bg.ac.yu//home/aegis/generated/2007-08-15/file71f10a8e-fb0b-41c5-99c7-8e4c55ac78c6 file:/home/neda/dpmtest2.txt
Source URL: srm://dpm.phy.bg.ac.yu/dpm/phy.bg.ac.yu//home/aegis/generated/2007-08-15/file71f10a8e-fb0b-41c5-99c7-8e4c55ac78c6
File size: 106
Source URL for copy: gsiftp://dpm.phy.bg.ac.yu/dpm.phy.bg.ac.yu:/storage/aegis/2007-08-15/file71f10a8e-fb0b-41c5-99c7-8e4c55ac78c6.4840.0
Destination URL: file:/home/neda/dpmtest2.txt
# streams: 1
# set timeout to 0 (seconds)
0 bytes 0.00 KB/sec avg 0.00 KB/sec inst
Transfer took 1020 ms
Finally, user can delete one or all replicas:
$ lcg-del -v --vo aegis -s dpm.phy.bg.ac.yu lfn:/grid/aegis/neda/dpmtest.txt VO name: aegis Using GUID : cb169e1e-8b3d-4c71-870c-bb02eccaefc0 set timeout to 0 seconds srm://dpm.phy.bg.ac.yu/dpm/phy.bg.ac.yu//home/aegis/generated/2007-08-15/file71f10a8e-fb0b-41c5-99c7-8e4c55ac78c6 is deleted srm://dpm.phy.bg.ac.yu/dpm/phy.bg.ac.yu//home/aegis/generated/2007-08-15/file71f10a8e-fb0b-41c5-99c7-8e4c55ac78c6 is unregistered
This example shows deleting one particular replica from specified SE (dpm.phy.bg.ac.yu). When the -a option is used, all replicas of the file will be deleted and unregistered. If all the replicas of a file are removed, the corresponding GUID-LFN mappings are removed as well.
The most common usage of lcg-utils is shown above. More detailed information about commands and their options can be found on the man pages and gLite User Guide.
lcg-utils API Usage
Copying and registering replica
lcg_cr("file:///home/user/srcFile","sfn://se.phy.bg.ac.yu",NULL,"lfn:/grid/seegrid/destLFN","seegrid",NULL,1,NULL,0,0,NULL);
Copying replica to another SE
lcg_rep("lfn:/grid/seegrid/fileToReplicate", "se.phy.bg.ac.yu", "seegrid",NULL,1,NULL,0,0);
Copying physical/logical file to the local file system of the UI (download)
lcg_cp("lfn:/grid/seegrid/fileToDownload", "file:///home/user/localDestFile", "seegrid",1,NULL,0,0);
//the first argument can be either logical file path (LFN) or physical file path (SURL). If the physical file name is given the file will be downloaded from the specified storage element.
Deleting physical file or logical file and its replicas
lcg_del("lfn:/grid/seegrid/fileToDelete",1,NULL, "seegrid",NULL,0,0);
Deleting physical file
lcg_del("lfn:/grid/seegrid/fileWhosReplicaIsDeleted",0, "se.phy.bg.ac.yu", "seegrid",NULL,0,0);
//deletes the replica of the logical file LFN from the storage element "se.phy.bg.ac.yu"
