MSACM

From EGEE-see WIki

Jump to: navigation, search

Contents

Application description

Multi-Scale Atmospheric Composition Modelling

The aim of this application is to use the Grid environment to produce an integrated, multi-scale Balkan region oriented modelling system, able to interface the scales of the problem from emissions on the urban scale to their transport and transformation on the local and regional scales. The system should be able to study the atmospheric pollution transport and transformation processes (accounting also for heterogeneous chemistry and the importance of aerosols for air quality and climate) from urban to local to regional (Balkan) scales; to track and characterize the main pathways and processes that lead to atmospheric composition formation in different scales; to account for the biosphere-atmosphere exchange as a source and receptor of atmospheric chemical species, and to provide high quality scientifically robust assessments of the air quality and its origin. The application is based on US EPA Models-3 system, which is known to be one of the best modelling tools that continues to be developed intensively by the efforts of a big community of scientists both in the US and Europe.

Scientific and Social Impact

  • achieving multi-scale model operating proficiency and skills, model validation
  • improved scientific understanding of processes and situations behind specific pollution episodes in the country
  • EU compatible tools and AQ monitoring strategy, following ozone daughter directive 2002/3/EC
  • AQ information and forecasts as a basis for sound decision making (short-term and strategic pollution abatement strategies)

Collaboration

  • Geophysical Institute and National Institute for Meteorology and Hydrology, BAS, BG – main developers
  • MEW (Ministry of Environment and Waters) and EEA (Executive Environmental Agency), National Statistical Institute – data for emission inventories, demographic, land-use and administrative data
  • Research groups in Albania (MetOffice-TU of Tirana), Greece (AUTh) and Romania (NMA) – interest to use the application, supply part of the data (AUTh).
  • Institute for Parallel Processing – gridification and deployment support

Application on the Grid

Job Management

There are two main stages in the execution, corresponding to the two models used - “mm5.deck.mpp” and “CCTM.exe”. The execution of several time steps is wrapped in a single job in order to save on data transfer time. The application requires some domain specific libraries, tackling transformation of data and visualization. They are installed in the shared software area for the environmental VO. The application is divided into series of MPI jobs. After one of these series is executed, the results are to be analysed by hand at the user’s workstation. Each of these jobs is an MPI job. We have found that for production utilization the optimal number of CPUs is 8, but if results are desired earlier, more CPUs can be used. The application has been tested to scale well for at least 32 CPUs.

  • A set of our own csh and perl scripts have been developed for job launching and results’ analysis. The JTS operational tool is used for obtaining higher priority for the jobs with respect to other jobs that are submitted to the cluster.
  • The NETCDF and other domain specific libraries, written mostly in FORTRAN are used. Some tools, developed by the team are also deployed dealing with various transformations of data.
  • The MPICH2 version 1.0.6 has been used for the parallel execution. This version is not the standard one installed if the gLite middleware way of installation is followed, however we have found that if we deploy mpich2 version 1.0.6 at the environmental software area we run the MPI jobs successfully. It is important for the supporting sites to configure the batch queue in a way that enables these jobs to obtain entire worker nodes for exclusive use. This is described above. The reason to use this particular version is that previous experience has shown that the model runs successfully with it and extensive validation of the model has been performed using the Grid cluster BG06-GPhI in local and then grid mode.
  • The UPM service is being used for monitoring of job success and bottleneck detection.

Data Management The input files are:

1.input of MM5 is obtained from NCAR data base

2.the output of MM5 is input for the emission model SMOKE and the Chemical Transport Model (CTM) CMAQ

3.the emission input for CMAQ is generated from the TNO emission inventory

The emission model SMOKE and own emission processing tools are used for the processing of data. Data is stored on the storage elements and indexed in the local LFC lfc.ipp.acad.bg. The DM-Web tool is under initial testing.


Image:Msacm-fig.jpg

Inter-job Communication

We have developed a simple script using AMQP messaging to notify the developers about job progress, using the AMQP to XMPP gateway which is available in the RabbitMQ message broker. This script is using the same message broker server as the JTS. Information can be send to jabber/Gmail/icq contacts. This has been useful in solving some performance issues related to NFS.

MPI jobs run on 4, 8, 16 and 32 processors on BG03-NGCC, BG04-ACAD and BG01-IPP. The jobs take up to 12 hours of wall clock time.

When the MPI job is using only one worker node, we avoid using the shared NFS filesystem due to performance problems.

Information service

The BDII and the LFC are being used for the data management.

Problem management

The application had to overcome several types of problems:

  • Lack of proper MPI support – MPI support in SEE-GRID-SCI has been improved after the implementation of the Guide for site administrators for MPI installation.
  • Random performance problems during MPI execution – this is result of the batch system sending the job to different nodes where it shares resources with other job. The configuration of the sites with an appropriate submits filter fixes this problem.
  • 1.Slow start of MPI jobs – Standing reservations have been implemented at the sites that pledge strong support for the environmental VO. In this way these jobs start immediately unless the site is heavily loaded with environmental jobs. In such case the developers have the option to use the JTS and thus change their QoS level from the default one to a “high” level which gives them higher priority and/or additional options to speed-up the job start.
  • 1.Slow performance of NFS in some cases, when lots of small files are being created (for example the command tar zxf xxxx.tar.gz can take unexpectedly large amount of time when there are large number of files in the archive, even though on a local storage the command finishes in 1s). This problem has been mitigated in two ways:

-some duplication has been avoided by storing most used files in the environmental VO software area at the sites.

-The scratch space at the worker node (under /tmp) is used when the job is using only one node, even if it is an MPI job.

We have tested the use of lustre filesystem on another Bulgarian NGI cluster with Infiniband, which is not yet part of SEE-GRID-SCI infrastructure and we have found that we can obtain up to 10 times improvement in the use cases outlined above and in general we can recommend the deployment of lustre filesystem at the SEE-GRID-SCI sites that have large amount of high IO jobs.


Image:Msacm-pic1.jpg

Presentations

  • K. Ganev,“Multi-scale atmospheric composition modelling for the Balkan region”, a poster of the 4th EGEE User Forum, 2-6 March 2009, Catania, Sicily, Italy;
  • K. Ganev, “Background Pollution Forecast over Bulgaria”, GRID&SEA workshop during LSSC’09 Conference, 3-8 June, Sozopol, Bulgaria;
  • D. Syrakov, “Climate Change Impact Assessment of Air Pollution Levels in Bulgaria”, GRID&SEA workshop during LSSC’09 Conference, 3-8 June, Sozopol, Bulgaria;
  • K. Ganev and Hr. Hristov, “Grid Computing for Air Quality and Environmental Studies in Bulgaria”, 23rd EnviroInfo 2009 Conference - Environmental Informatics and Industrial Environmental Protection: Concepts, Methods and Tools, Berlin, September 9th - 11th 2009.
  • K. Ganev and e. Atanassov, “Grid Applications for Air Quality Studies in Bulgaria”, EGEE09 Conference, Workshop “Earth Science Grid Highlights”, 21-25 Sept. 2009;

Papers

  • Angelina Todorova, Georgi Gadzhev, Georgi Jordanov, Dimiter Syrakov, Kostadin Ganev, Nikolai Miloshev, Maria Prodanova, 2009, Application of the US EPA Models 3 sysytem for numerical simulations of high PM10 levels episode, 7th International Conference on Air Quality – Science and Application, 24-27 March 2009, Istanbul, Turkey;
  • Kostadin Ganev, Dimiter Syrakov, Maria Prodanova, Nikolay Miloshev, Georgi Jordanov, Georgi Gadjev, Angelina Todorova, Atmospheric composition modeling for the Balkan region, SEE-GRID-SCI User Forum, 6-11 December 2009, Istanbul, Turkey, pp. 77-85, ISBN: 978-975-403-510-0;
  • Angelina Todorova, Georgi Gadzhev, Georgi Jordanov, Dimiter Syrakov, Kostadin Ganev, Nikolai Miloshev, Maria Prodanova, 2009, Numerical Study of Some High Pm10 Levels Episodes, to appear in Lirkov, S.Margenov, and J.Wasniewski (Eds.), LSSC2009, Lecture Notes in Computer Sciences;
Personal tools