Gridification of an application using parameter study workflow
From EGEE-see WIki
- Gridification of an Application Using Parameter Study Workflow
- From EGEE-see WIki
-
Jump to: navigation, search
This Wiki page is a part of SEE-GRID Gridification Guide. It is contributed by Polytechnic Universiy of Tirana.
- Contents
[hide]
- 4.1 Introduction to Application
- 4.2 Gridfication of Application
[edit]
- Introduction
In this guide we present a method that can be used to gridify applications using parameter study method. For applying parameter study method P-GRADE portal can be used. P-GRADE portal is a workflow-oriented Grid portal with the main goal to enable users to manage the whole lifecycle of workflow-oriented complex grid applications. Workflow is an application composed of several jobs connected with arches that represent transfer of data through file transfer. Below is given a picture of a workflow. Jobs can be sequential or mpi parallel.
Figure 1.1 - Workflow of an Application
The P-GRADE portal supports the graphical development of workflows created from various types of existing components (sequential or parallel), executing these job-workflows in the Grid relying on user credentials, analyzing the monitored trace-data by the built-in visualization facilities, and so on.
What Parameter Study Is ?
Before continuing we can give a definition of what parameter study mean: The same algorithm is applied in parallel to every member of the input data set(or parameter set, parameters can be input files or command line parameters). If we have the input parameter set (p1,p2,p3,p4) where pi is one instance of input data and if we apply this set to the algorithm using parameter-study method we will have 4 copies of the algorithm run in parallel each with different pi data as input.The input can be more than one input set in this case: Algorithm applies to every member of the product of these sets.
If we have InputSet1={c11, c12 , c13 } and InputSet2={c21 ,c22 }.
ProductSet={ {c11,c21}, {c11,c22}, {c12,c21}, {c12,c22},{c13,c21}, {c13,c22}}
We will have 6 copies of algorithm run in parallel with each one of 6 members of ProductSet as input.
- Application of Parameter Study to a Simple Example
The parameter study method can be used for gridification of applications composed of several procedures executed serially one after the other as explained below.
Suppose that we have an application, witch its main program is composed 4 procedures executed serially where the output data of one serves as input data into the other ,and 2 global variables(program is given in pseudoC language to eliminate grammar details, my own name for a pseudo code).
Suppose we have this configuration taken to make the application as general as possible.
We call the input data to the main as M
We call the input data to the proced1( ) as IN1(it can be data from files ,data from global variables,data from the user ect)
We call the output data to the proced1( ) as OUT1
We call the input data to the proced2( ) as IN2
We call the output data to the proced2( ) as OUT2
Suppose that proced3( ) takes as input the 2 outputs of the procedures above
so we have IN3=OUT1+OUT2,and produces its output we call OUT3.
Procedure proced4( ) takes as input the output of proced3( ) so IN4=OUT3
and produces its output OUT4 that in fact is the output of the entire program.
global v1
global v2
main( )
proced1( )
proced2( )
proced3( )
proced4( )
First of all we have to see if the input data to the main program can be divided into several independent parts and main algorithm can be applied using parameter study to these independent parts. This is the most simple,easier and less time consuming method of gridification but not for all programs can its main input data be divided in parts.
The next step if the first one is not successful is to identify the most time consuming procedure of all and see if the input data to this procedure can be divided in different independent parts. In order this method to be efficient and easier to implement we need to find only one procedure that is the most time consuming compared to all others. If there is more than one procedure that consumes time or its input data is not dividable hopelessly we can not parallelize the application and our discussion finish here bye..?!(having more than one procedure that takes the most computing time even if the input data are dividable raises the complexity of gridification and we will not discuss it in this guide).
Suppose that procedure proced3( ) satisfies the condition above( the procedure that takes the most computing time and its input data is dividable) . In this case we can apply parameter study to this procedure dividing its input data and running copies of it in parallel each with different part of data.
More concretely we divide OUT1 and OUT2 in different parts to create two input sets with different parameters
OUT1=(OUT11,OUT12,OUT13) and OUT2=(OUT21,OUT22)
The product set is PRODUCT=((OUT11, OUT21),(OUT11, OUT22),(OUT12,OUT21),(OUT12, OUT22),
(OUT13,OUT21),(OUT13,OUT22))
So applying parameter study to procedure proced3( ) we will have 6 copies of proced3( ) running in parallel each taking as input one of the parameters of the PRODUCT set. For example one copy will take as input (OUT11, OUT21) and so on.Number of parallel copies and hence the performance gain achieved will be determined by the number of parameters we divide each of the input data to proced3( ) OUT1 and OUT2. Each of this copies will produce an output in total 6 outputs OUT31,OUT32,OUT33,OUT33,OUT34,OUT35,OUT36.
All these outputs will be collected and will be input to the procedure proced4( ).
In order to create the hole application workflow we divide our program and create a workflow composed of 3 jobs.
The first job we call it MAIN is composed of procedures proced1( ) and proced2( ). It takes the main program input data M and generates the two input sets OUT1 and OUT2 witch are given to the second job through file transfer to be used in the parameter study application of second job
The second job we call it PROCED3 is the parameter study job composed of procedure proced3( ) that takes as input the two parameter sets OUT1 and OUT2 and will be run as 6 copies in parallel
The third job we call it COLLECT is composed of procedure proced4( ),collects all output data OUT31,OUT32,OUT33,OUT33,OUT34,OUT35,OUT36 from parameter study job PROCED3 through file transfer of course and produces the final results of the application.
- CPFCGM Application
The main() algorithm of application is composed of several procedures as below that are executed in sequence one after other:
MAIN( )
read_database( );
wf_trees_entry();
wf_matrix_node_branch( );
wf_routes();
wf_trees_counter();
wf_trees_det0();
wf_trees_routes();
main_loop();
1) read_database( ) is a procedure to read the data of the graph from a file and put it in some array variables of the program
2) wf_trees_entry() wf_matrix_node_branch( ) are routines for constructing the topology of the graph (that is it defines witch node is connected with witch node and with witch line ect..)
3)- wf_routes() is used to find and save in some variables all the routes that pass through each power source (a route is all branches connected one after the other from one point to an other point)
4) wf_trees_counter(); wf_trees_det0(); find all trees of the graph (tree is a subgraph that doesn`t form a loop) wf_trees_counter() finds all possible combinations that can be trees wf_trees_det0(); finds the actual trees from data input from wf_trees_counter();
5) wf_trees_routes() find all trees that are part of every route that were found with so find all the combinations tree-route
6) main_loop() is a routine that does some calculations that produces partial fluxes from them calculates the fluxes in every branch of the network
[edit]
- Gridification of Application
We found out from experiment that the most time consuming procedure is wf_trees_routes( ); so running several copies of this procedure in parallel with subset of data will reduce the total execution time radically.
So we can create a workflow consisting of three jobs:
1)- The Generator job called Job0 in fig4.1 is composed of routines below: read_database( ); wf_trees_entry(); wf_matrix_node_branch( ); wf_routes(); wf_trees_counter(); wf_trees_det0(); This job takes as input a data file containing the data about the network, constructs the graph topology and it serves as a generator job that generates as output two sets of parameter study. The first set is composed of several files "fileroutes.n" that contain all routes that pas through each power source (e.g fileroutes.1 contains all routes tha pas through power source 1). The number of files depends on the number of power sources on the network The second set is composed of several files "filetrees.n" that contain a subset of all trees of the power network graph. The number of these files is set by us taking into account the file size and file transfer time. So the smaller the number of files the bigger the file size and file transfer time.
2)-A job called Job1 in fig4.1, is the job composed of wf_trees_routes( ) procedure that will run as a parameter study. It takes as input two parameter sets one composed of "fileroutes.n" files and the other composed of "filetrees.n". The connection of these sets as inputs to the Job1 is made through the so called PS(Parameter Study) input ports one for each parameter set. Different copies of Job1 will run in parallel each with a different combination of "fileroutes.n" - "filetrees.n" from the two sets as input. For example if we have 5 "fileroutes.n" files and 5 "filetrees.n" files we have 5x5=25 combinations meaning 25 copies of job Job1 will running in parallel thus in theory reducing the execution time by a factor of 25. Each copy of Job1 generates as output a file called "fileroutestrees.i-j" that contain all trees stored in "filetrees.i" that pas through each route stored in file "fileroutes.j".
Fig 4.1- Parameter Study Workflow of the Application
3)The third job not shown in fig contains the Main_loop( ) procedure mention before. This job is not really part of the workflow because it can be run on a single computer with the data produced by workflow as input. It can also be added to the workflow as a collector job that collects all the data from the parameter study jobs Job1 but this is more for convenience than for any performance gain from the grid. This job takes as input all files "fileroutestrees.i-j" and calculates partial fluxes and other parameters of the power network .
Some of the results of our experiment are given below: For testing we created a parameter study workflow for a power network consisting of 13 nodes and 19 branches. Running one copy of Job1 containing the wf_trees_routes procedure for this network takes approximately 33 min so running 25 copies of it on one computer serially takes 25x33min=825min=14 hour. So running 25 copies of Job1 in parallel on the grid, they will finish in 33min thus in theory we will have a speedup of 25 (taking into consideration only the speed-up achieved for the wf_trees_routes procedure) However in practice there are some limitations for several reasons some of witch are: the overhead of communication, the overhead of grid services for example the wait and schedule time by the broker and scheduler , some practical limitations from the portal server that permits only 5 parameter study to run in parallel at a time, serial parts that can not be parallelized as it is Job0 ect. We started execution of the workflow for the power network above on the grid at 8:00 and it finished at around 13:00 so taking 5 hours. Comparing of what it would take if executed on a single computer about 14 hour we achieved a speedup=14/5=2.8
So gridifing this application first of all makes it possible to make this application run able as it was not possible to run on a single computer for large power networks because it blocked the RAM and caused a crash. We achieved this by dividing the large data set of the application into several files. We could run this transformed application on a single computer but it would take a lot of time so grid help here through parallelism to achieve improved performance with certain speedup that in our case is at around 3.
Conclusions
Some points to have in mind during the gridification
1)-First of all the application have to take a lot of computing power and time to gain any performance improvement by running it on the grid. This computing time is relative but it has to be in terms of days: one day, two days one week and more etc.
2)-If the application is composed of several procedures it‘s needed to identify the procedure or procedures the take the most computing time compared to other procedures that is worth parallelizing.
3)-After identifying the most time consuming procedure it has to be seen if the data input to this procedure can be divided into sub parts and can several copies of the procedure algorithm run on these sub parts in parallel exploiting the parameter study workflow method. If the data processed by this procedure can not be divided or in general the data input to the program can not be divided the application can not be run in parallel on the grid can not gain any performance improvement.
4)If the data input can be divided it is need to divide it in several files each containing a sub part of the data because transfer of data from master job to worker jobs has to be done through file transfer. So this means that some manual intervention to procedure code is needed to divide the data into files.
5)Also any global variables have to be transferred from master job to the others through file transfer because now the application is divided into separate jobs each consisting of a separate program with its main( ) procedure and its own global variables.


