Grid Interactivity, Pilot Jobs, and Job Pooling

From EGEE-see WIki

Jump to: navigation, search

This Wiki page is a part of SEE-GRID Gridification Guide. It is contributed by Belgrade University Computer Centre.

Contents

Pilot jobs

The use of pilot jobs is gaining increasing popularity among groups of grid users (1). With pilot-based infrastructures, users submit their jobs to a centralized queue or job repository. These jobs are handled by grid resources executing asynchronously started pilot jobs. Pilot jobs communicate with a pilot aggregator, which allocates user jobs from the repository. A pilot job it is not committed to any particular task, and may not be even bound to a particular user, which allows many users to exploit a single pilot infrastructure within one VO.

Once started, the jobs get the actual work to be performed from a pilot aggregator. If there are no tasks waiting for the pilot, the job exits immediately. After performing assigned work, a job could run more than one user job if the limit its maximum wall clock time is long enough, thus making full use of allocated resource slot.

The late binding of user jobs and resources performed by the pilot aggregator greatly improves the user experience. This is the consequence of the following benefits provided with piloting concept:

  • Piloting hides broken grid resources from end users, as only the successfully started pilot jobs get an opportunity to start the actual work,
  • A pilot job can provide the accurate information about available resources, which makes matchmaking much more reliable,
  • It is possible to prioritize between available resources and pending user jobs.

A pilot infrastructure can submit jobs according the current load and behaviour of grid sites, resulting in faster perceived start of jobs during steady operation.

Pilot jobs can be submitted on behalf of a final user, or the actual user being served can be decided upon start of the pilot job. In the simplest scenario, all submitted jobs are executed on behalf of a single user. The actual user identity is lost, as well as all benefits of grid authentication and authorization mechanisms, as user-level access control and security policies, logging and accounting. On the other hand, it is possible to implement jobs submission to be done on behalf of actual users. A user pilot job would do processing just for its owner, for example by pulling only owner's jobs from the repository, but this seriously limits intra-VO scheduling. Therefore, the ability to change identity of pilot job owner upon assignment of actual user is very beneficial in heavily used pilot infrastructures on a production grid. One proposed solution for this problem is usage of gLExec tool (2, 3). However, if intra-VO accounting is used, it must be aware of this late identity switching performed by pilot jobs.

Interactive jobs on the grid

There are many implementations of user interaction with jobs running on the grid (for example: Interactive jobs using bypass, Interactive jobs using rrs, Interactive jobs using ssh, Job output monitoring using job perusal, Job output monitoring using grid-stdout-mon), These solutions provide communication between the user and job through text terminal-like user interface to the job's standard input and output.

Since interactivity requires establishing and maintaining of communication between the end user and job, it is necessary to mediate between running client programs and server processes running on the grid. Therefore, mere supplying of a pilot job with work description may not be enough, and, after two parties are paired, further communication between them should be facilitated. Sometime is sufficient to establish a bare TCP connection between the client and the job, but in other cases they may need to exchange messages using other protocols and formats, like HTTP and XML, that may be required by more advanced clients that go beyond text terminal user interface.

But with all communication implementations, one problem persists. The start-up time for grid jobs ranges from a few tens of seconds to days, depending on the occupation of job queues, duration of transfers of job related data, and initialization delays inherent to jobs. And if a job does not respond in a short time, the user will quit the interactive session, not being by the terminal when the jobs starts and begins asking for feedback. Although pilot jobs can alleviate this problem, interactive jobs require immediate availability of jobs that could interact with the user. Some of above mentioned factors can be minimised, but the most suitable strategy is to perform the majority of preparatory activities before the actual interaction takes place. For that reason, it is justified to keep a number of pilot jobs waiting for some time for users to appear with their jobs.

The suitable size of the pool may vary significantly, as it depends on its repopulation rate, and even more on current load and actual user arrival dynamics. In order to handle interaction with individual end users, interactive applications often need many processes for short periods of time in order to distribute processing load among them. Such pattern of usage is extremely unsuitable for the current grid infrastructure, and requires a large pool in order to be adequately handled. Such interactive sessions can be managed by clients communicating with several jobs, or by letting individual jobs to delegate subtasks to other jobs. In the latter case, the coordinating job can allocate subordinate jobs as any other client. This approach towards job allocation may be even attractive for non-interactive applications, resulting in significant performance gain for workflows that iteratively run short jobs. The additional advantages of this approach are abstraction of underlying job management infrastructure, as well as ability to uniformly access jobs from several grid sites, if desired. Therefore, client programs can simply obtain one or several server processes and communicate with them without even being aware of jobs and grid infrastructure.

The pooling of pilots jobs, which are idle before obtaining the work from the client, raises the issue of grid infrastructure usage efficiency. In usual pilot infrastructures, the user jobs wait in the repository and pilot jobs end briefly if there is no pending work - so there is only a minor additional load on grid resources that is caused by execution of unused pilots. On the other hand, keeping a pilot in a pool for some time wastes the grid node on which it executes. Therefore, the size of the pool of available jobs must be minimised in order to optimally use the resources.

We believe that high availability of ready jobs with a small overhead can be archived by implementing an adaptive job submission and pool management policy. This can be achieved by combining several elements:

  • A single job pool should be being shared among several applications and users – for the same pilot job availability, this requires a single pool that is substantially smaller than a set of several application-bound pools.
  • If possible, a pilot job should be reused after finishing previously assigned work. This is crucial for interactive applications that are more likely to stochastically demand many jobs for short periods of time, release them, and then request processing resources again.
  • It is necessary to track of each job response time for each grid site – with this information it is possible to estimate the start of a submitted job, avoid flooding of computing element job queues, and choose best candidates for quick generation of pilot jobs. This tracking can be combined with monitoring of CE queue sizes.
  • Since various grid sites differ in processing resources, level of support for the pilot infrastructure and supported applications, there should be a configuration for each supporting site, describing the maximum number of executed or scheduled jobs as well as supported applications.
  • The pilot infrastructure should be able to detect increase in demand for pilot jobs that justifies increasing of pool size, the situation when new jobs need to be created quickly, as well as decrease of demand that requiring reduction of the pool, or even complete absence of workload.

In order to meet the latest requirement, we propose introduction of two operational modes defining target pool sizes, and two methods of selection of computing elements for newly created pilot jobs.

  • In idle mode, there are no active users, that is, no clients that are requesting or using the pilot jobs, and the number of jobs available in the pool is kept at a predefined minimum agreed by the users and administrators of the grid infrastructure.
  • In active mode, pilot jobs are being used, the needed size of the pool size is larger, and can depend on the number of active jobs, recent dynamic of client arrivals, and even applications being executed. The actual target pool size can be calculated using configurable parameters, or a set of per-application rules.

In both operation modes, grid sites, i.e. computing elements receiving requests for new pilot jobs can be determined using either regular or full-throttle strategy.

  • Regular CE selection policy is used when pool size is satisfactory. The jobs are randomly submitted to all supporting grid sites that have not reached the maximum of executed or scheduled jobs, in proportion with the number of remaining job slots.
  • Full-throttle strategy is activated when the system detects increased demand for new pilot jobs, while the number of job in the pool is small. The jobs are submitted to the sites that, based on the recent history, quickly start submitted jobs. This approach increases the chances to quickly populate the pool with new jobs and thus prevent the situation in a client could not be matched with a pilot job.

The strategy being used is decided using predefined thresholds for pool size and frequency of client arrivals. These thresholds should differ for idle and active mode. However, at some point it may become useful to make job submission strategies dependant on currently active applications, particularly with application-dependant pilot selection.

Finally, when a significant decrease in demand for pilot jobs is detected, an explicit ending of some jobs in the pool is required. Although it is possible to wait for jobs to be regularly purged due to approaching their maximum wall clock time, it is better to forcefully terminate some pilots after a detecting large discrepancy between actual and target pool size, and thus reduce the overhead in resource usage. However, it is important not to prematurely terminate excess jobs, since some applications may cyclically allocate and release jobs. Another possibility is to combine interactive and batch-type user jobs within the single pilot infrastructure, and, instead of simply discarding jobs from the pool, use them for non-interactive processing.

Generalized binder for interactive jobs

At Belgrade University Computer Centre, a generalized solution for submission and management of interactive jobs capable of supporting several applications was implemented. It is based on TCP binder software developed as a part of VIVE (Volumetric Image Visualization Environment) application that was gridified during SEE-GID project.

The system described here consists of software components distributed in three tiers:

  • Various application-specific client programs that interact with end user
  • Application-specific server programs running on the grid
  • Generalized binder providing job management and mediation between above two.

The binder submits pilot jobs to the grid, maintains a pool of ready jobs, and mediates between clients and pilots. It pairs client and server programs from which it has received TCP connections. A small dispatcher program started by pilot job then starts an application-dependent Java or C++ class or executable file installed on used grid site. Further communication between the client and server is continued using the initial TCP connection or, if desired, an arbitrary direct communication channel.

In initial conversation, both server and client connect to the binder and send server job and CE identification or selection criteria server and CE, supported and used application, remaining or required server wall clock time, access type and data for direct communication. Since the binder accepts inbound connections from both clients and servers, there is no need for special firewall arrangement allowing jobs to open server-type connections on grid worker nodes. When a client establishes connection, the binder allocates appropriate job from the pool of ready jobs and runs a thread which handles further communication between the server and client, exchanging the traffic through established sockets. After transfer of initial data common to all applications, client and server can exchange application specific data, or switch other means of communication. As clients communicate with pre-submitted and running server jobs, they get almost immediate response and robustness associated with pilot infrastructures.

The use of binder as a proxy between client and server allows application-specific monitoring of communication, or even processing and filtering to take place. For example, it is possible to plug in application-dependent persistence and logging of user requests, results, or communication and processing time. For this purpose, application-specific handlers can be integrated into binder. If such a handler monitors the traffic, it can respond to some of client's requests without propagating them to the server. It can also cache some responses that have a high chance of being repeated, or answer to some persistence-related requests, like saving and retrieving data describing user sessions.

The original application-specific implementation was made generic by providing ability to plug-in application related code to both server jobs and binder itself. It supports the key features needed for pooling of interactive jobs (described in ‎0): sharing of the pool by several applications, reuse of pilot jobs, tracking of job submission delay for each CE, and CE configuration. Dynamic resizing of job pool intended to optimize resource usage will be also implemented soon.

Besides above-mentioned architectural changes required for support of several applications, the new binder implementation saves the data about submitted jobs and CE performance into a relational database, thus allowing smoother recovery after restart of binder process.

On the other hand, the implementation of pilot job owner identity changing and integration with intra-VO accounting will be postponed until involved application and supporting VOs request these features. Also, unlike most of piloting infrastructures, our solution does not implement transfer of job-related executables or data files. This feature is not planned for incorporation into the core framework, but can be easily implemented at application level.

It is currently being experimentally used within AEGIS VO by PBFS SEE-GRID-2 application and Lizza-PAKP, an environmental application for analysis of underground waters. Both applications connect a Windows client to the server processes running on the grid. The client is used as a dashboard for application progress monitoring.

Client, binder and server related configuration code samples will be provided to help interested developers in integrating their application.

Integration of applications into server jobs

A simple dispatcher, suitable for use by pilot jobs started by binder, was implemented in Java. It allows running of an external executable or application-dependent Java class using already open socket towards the binder and client. Depending on application and behaviour requested by the client, processing can be continued by server handler communicating through provided socket or by external program, which must establish its own communicant with the client. A configuration file used by the dispatcher contains a list of supported applications and, for each one of them, associated handler class or pathname of an external program.

Java server handlers just need to implement the following interface

interface ServerHandler {
public void run(SocketWrapper sw, 
                String jobID);
}

It should use the provided socket consistently with expectations of the client and close it at the end of communication. The handler can also access application specific properties defined in the dispatcher configuration by using ServerDispatcher.getProperties() static method.

If an external executable is used instead, the dispatcher can pass address and port information provided by the client through command line parameters.

Binder configuration

The binder configuration describes its general settings, as port to be used by clients and servers, database connection, general monitoring and job submission preferences, as well as per-site job submission and pool management policies.

By default, the traffic between the client and server is exchanged by a simple handler that acts as plain proxy. An application-specific handler can be integrated into binder and thus intercept, filter, or monitor this communication by implementing the following interface:

interface ServerHandler {
public void run(SocketWrapper sw, 
                String jobID);
}
implements BinderApplicationHandler{
public void run(SocketWrapper clientSW,
                SocketWrapper serverSW);
}

If needed, the binder handler can also access application specific properties defined in the dispatcher configuration by using TCPBinderUtil.getProperties() static method.

Personal tools