P-GRADE Portal
From EGEE-see WIki
Contents |
Introduction
P-GRADE Portal (Parallel Grid Run-time and Application Development Environment) is a service rich graphical environment for the development, execution and monitoring of data-driven grid applications. P-GRADE Portal supports workflow and parameter study applications built up from various types of sequential and parallel components (jobs or services). The tool hide low level grid specific access mechanisms with abstract, technology neutral graphical interfaces, making even non expert users capable of defining and executing distributed applications on Globus 2, Globus 4, LCG and gLite based computing infrastructures. Workflows and workflow based parameter studies are portable between grid platforms by the P-GRADE Portal without learning new systems or re-engineering program code. The tool itself is developed by an open alliance and its different installations operate as central services for several grid based VOs around the world. This document aims at providing information on the usage of P-GRADE Portal in grid application development and gridification. Further information about the tool can be found at www.portal.p-grade.hu.
Stucture of P-GRADE Portal
P-GRADE Grid Portal is implemented as a two layer architecture:
- The lower layer is a set of services built on top of the most widely used grid middleware components. These services enable more complex and abstract user - grid interactions than standard grid middleware services do. This layer mostly consists of Linux scripts and C programs.
- The higher layer contains graphical user interfaces implemented as JSR168 compliant web portlets. These portlets are written in Java. This layer hides the grid middleware services and the P-GRADE Portal service layer behind intuitive graphical interfaces.
Typical P-GRADE Portal use cases
The P-GRADE Grid Portal support both developers of new grid application and users of already gridified applications. The high level service layer enables the quick composition of grid applications from existing legacy components, while the graphical interface layer significantly improves the quality and user friendlyness of grid application execution and monitoring.
Application developer use case
The use case consists of several substeps. In the first step the developer defines the structure of the grid application as a directred acyclic workflow graph. Then he/she tests the logical correctness of the application by executing it on some sample input data. As the last step he/she scales up the workflow to a parametric study, providing larger input data set for the execution.
Workflow development phase
- The user logs in to the portal and opens the workflow editor component.
- Defines the workflow level of the grid application by combining sequential or parallel batch jobs into a data driven directed acyclic graph. Components of the graph can be executable binary codes that are available on the client machine, on the portal server, or on the computing elements of the grid.
Workflow testing phase
- The user provides sample input files for the workflow that can be used for test execution. The purpose of the test execution is to check the correctness of the workflow logic. Input files for the test execution can come from the client machine or from grid storage elements.
- The application is saved, as a result all the client side binary and input files are uploaded into the storage space allocated for the user on the portal server.
- The “Certificates” portlet is opened and short term proxy credential are imported from MyProxy servers. These proxy certificates are required for user authentication at the grid infrastructure level.
- The user submits the workflow using the “Workflow” portlet. The workflow manager subsystem of P-GRADE breaks down the workflow to elementary operations and performs these operations on behalf of the user. The workflow manager submits jobs, invoke application services and transfer files according to the structure of the workflow graph. This structure defines data dependencies between components and controls the ordering of elementary file transfer and job execution operations.
- The user monitors the progress of the workflow by the visualization facilities of the “Workflow” portlet and the workflow editor. The standard output and standard error streams of the workflow components are accessible on the “Workflow” portlet real time during the execution. After a workflow component finished its output files are also downloadable through this portlet.
- In case any of the jobs or file transfer operations fails the workflow manager provides error report for the user on the “Workflow” portlet. In case of certain error types the user is allowed to modify the workflow, to download new proxy certificates for it and then resume the execution from the point of failure. This mechanism maintains the internal results that were successfully produced by preceding components of the workflow.
Parametric workflow phase
- The developer opens the workflow in the Editor component.
- He/she defines the input parameter space for the workflow. Each dimension of the space is a set of files with similar structure but different content. One such a file generates one execution of the workflow. Each point of the parameter space is defined be a file tuple. This tuple is given as one input set for the workflow. An N point parameter space generates N execution of the original workflow.
Workflow and component sharing phase
- The user logs in to the portal and opens the "Workflow" portlet.
- Using the "Storage" operation, an existing workflow can be saved from the portal server to the user's desktop machine.
- With the "Upload" operation, a workflow can be uploaded to the portal server from the user's desktop environment.
- The "Demo Workflows" section of "Upload" form lists the available prefabricated demo applications. These applications generally test the P-GRADE Portal and the current environment (certificates, settings and the Grid). A demo workflow can be downloaded using this section.
- Besides the repositories of the portal server, a user can download a workflow from [1]. A workflow can be shared on this page by mailing the tgz file and the description of the workflow to pgportal@lpds.sztaki.hu
Services provided by the P-GRADE Grid Portal for grid application developers:
- Graphical environment to define data driven grid applications built up from various types of components (sequential or parallel batch programs and services). The "Workflow editor" component of the P-GRADE Portal can be used to define new applications or to modify existing applications. The Workflow editor is implemented as a Java Webstart application that can be downloaded from the portal server on-the-fly, without installing any grid specific program on the application developer's machine.
- executing such applications in Globus (GT2, GT4), EGEE (LCG, gLite) and ARC middleware based Grids
- collecting trace-data generated by the application
- visualizing the progress of grid applications in real time
- monitoring the grid infrastructure used for the execution
New Portlets for P-GRADE Portal
File Management Portlet and the extended Certificates portlet have been implemented by Middle East Technical University, a member of the P-GRADE Portal Developer Alliance. Currently, the portlets are available on the Turkish Portal at http://portal.grid.org.tr:8080/gridsphere/gridsphere.
Credential Management Operations
Besides upload and download, MyProxy credential management operations can be performed from the "Certificates" portlet as displayed on Figure 1.
Supported MyProxy credential management operations are:
- Displaying information about MyProxy credentials (myproxy-info)
- Changing password of a MyProxy credential (myproxy-change-pass-phrase)
- Removing a credential from the MyProxy repository (myproxy-destroy)
"Credential Management" form, accessed using the "Credential Management" button of "Certificates" portlet, is shown on Figure 2.
File Management Portlet
File Management Portlet provides a graphical interface for LCG File Catalog(LFC) operations and file management operations. The portlet supports following operations:
LFC operations
- Listing contents of an LFC Name server
- Browsing directories of an LFC Name server
- Creating a new LFC directory
- Removing a directory/file
- Displaying directory/file information (owner-group information, last modification date, access rights)
- Changing access rights of a directory/file
- Renaming a directory/file
File Operations
- Uploading a local file to a storage element
- Downloading a file from a storage element
- Replica Management
- Listing replicas of a file
- Replicating a file to a storage element
- Deleting a replica of a file
To carry out these operations, the user should download a short term proxy credential and map it to the VO involved using the "Certificates" portlet.
File Management Portlet is shown and directory/file entries of the LFC Host of trgridb VO are listed on Figure 3 and Figure 4 respectively.
Example Gridification of a Bioinformatics Application
G-PPI is a bioinformatics application which searches for protein-to-protein interactions in a protein database using templates of such interactions. The application was written in Python. Using P-GRADE Portal, it was possible to gridify the sequential application with only minor changes to the code. The development followed the typical application developer use case outlined above. First, the application was changed so as to run on the grid. Then, parametrization was used to scale up the application.
The G-PPI application presents a simple case of a workflow, like many other scientific computing applications that work in multiple phases on chunks of data. The application consists of two phases. In the first phase, surfaces of proteins that may contribute to interactions are extracted. In the second phase, the interaction templates, which consist of a pair of surfaces, are matched to each extracted surface, and matching interactions are output in XML format. A requirement of the application was that the two passes were kept separate so that either of them could be improved individually. P-GRADE Portal makes this separation an easy task. A workflow consisting of two sequential batch jobs was sufficient to define this initial gridification.
G-PPI uses three bioinformatics tools, FASTA, NACCESS, MULTIPROT. We simply made a TAR archive of these linux programs together, uploaded them to the grid, and changed the program so that it accepted a command line parameter for the TAR file, and extracted it in runtime. Similarly for the input and intermediate data files. The first phase of the program requires as input only the protein database which spans many files. Thus, it was transferred as an archive, as well. The second phase requires the extracted surfaces and template database. These two data items can also be transferred as archives. Thus, changes to command line processing, archive creation/extraction, and setting the parameters in the workflow, is sufficient to make the application run on the grid.
The later step is naturally to parametrize the workflow, or otherwise for a sequential application we cannot get much out of Grid resources. In our case, especially the second phase of computation (matching templates) is expensive. It would take several hundreds of days on a single workstation for the entire database. Accordingly, we have to run it in parallel. In the first version of parametrization, the automatic generator was used so that it generated job numbers for each job starting from 0. The data was not partitioned, each job took all the data, but the program was modified so that it worked only on part of the data using a simple static partitioning strategy of giving each job an equal number of proteins.
This version of the gridification is currently in testing. Once the test is complete, we will complete and test a more intelligent parametrization of the workflow, by writing a custom generator and a collector. The custom generator will divide the protein data so that both phases of the application will download only the required portion of protein database and process those proteins that it received. The collector will accumulate all matchings and output a single XML file.




