SAGA BigJob: An extensible and interoperable Pilot-Job abstraction for distributed applications and systems

André Luckow, Lukasz Lacinski, Shantenu Jha

Research output: Chapter in Book/Report/Conference proceedingConference contribution

47 Citations (Scopus)

Abstract

The uptake of distributed infrastructures by scientific applications has been limited by the availability of extensible, pervasive and simple-to-use abstractions which are required at multiple levels - development, deployment and execution stages of scientific applications. The Pilot-Job abstraction has been shown to be an effective abstraction to address many requirements of scientific applications. Specifically, Pilot-Jobs support the decoupling of workload submission from resource assignment; this results in a flexible execution model, which in turn enables the distributed scale-out of applications on multiple and possibly heterogeneous resources. Most Pilot-Job implementations however, are tied to a specific infrastructure. In this paper, we describe the design and implementation of a SAGA-based Pilot-Job, which supports a wide range of application types, and is usable over a broad range of infrastructures, i.e., it is general-purpose and extensible, and as we will argue is also interoperable with Clouds. We discuss how the SAGA-based Pilot-Job is used for different application types and supports the concurrent usage across multiple heterogeneous distributed infrastructure, including concurrent usage across Clouds and traditional Grids/Clusters. Further, we show how Pilot-Jobs can help to support dynamic execution models and thus, introduce new opportunities for distributed applications. We also demonstrate for the first time that we are aware of, the use of multiple Pilot-Job implementations to solve the same problem; specifically, we use the SAGA-based Pilot-Job on high-end resources such as the TeraGrid and the native Condor Pilot-Job (Glide-in) on Condor resources. Importantly both are invoked via the same interface without changes at the development or deployment level, but only an execution (run-time) decision.

Original languageEnglish (US)
Title of host publicationCCGrid 2010 - 10th IEEE/ACM International Conference on Cluster, Cloud, and Grid Computing
Pages135-144
Number of pages10
DOIs
StatePublished - Jul 30 2010
Externally publishedYes
Event10th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2010 - Melbourne, VIC, Australia
Duration: May 17 2010May 20 2010

Other

Other10th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2010
CountryAustralia
CityMelbourne, VIC
Period5/17/105/20/10

Fingerprint

Dynamic models
Availability

All Science Journal Classification (ASJC) codes

  • Computer Networks and Communications
  • Computer Science Applications
  • Computational Theory and Mathematics

Cite this

Luckow, A., Lacinski, L., & Jha, S. (2010). SAGA BigJob: An extensible and interoperable Pilot-Job abstraction for distributed applications and systems. In CCGrid 2010 - 10th IEEE/ACM International Conference on Cluster, Cloud, and Grid Computing (pp. 135-144). [5493486] https://doi.org/10.1109/CCGRID.2010.91
Luckow, André ; Lacinski, Lukasz ; Jha, Shantenu. / SAGA BigJob : An extensible and interoperable Pilot-Job abstraction for distributed applications and systems. CCGrid 2010 - 10th IEEE/ACM International Conference on Cluster, Cloud, and Grid Computing. 2010. pp. 135-144
@inproceedings{5ef8705c5cc545569b43a96e90d56f5e,
title = "SAGA BigJob: An extensible and interoperable Pilot-Job abstraction for distributed applications and systems",
abstract = "The uptake of distributed infrastructures by scientific applications has been limited by the availability of extensible, pervasive and simple-to-use abstractions which are required at multiple levels - development, deployment and execution stages of scientific applications. The Pilot-Job abstraction has been shown to be an effective abstraction to address many requirements of scientific applications. Specifically, Pilot-Jobs support the decoupling of workload submission from resource assignment; this results in a flexible execution model, which in turn enables the distributed scale-out of applications on multiple and possibly heterogeneous resources. Most Pilot-Job implementations however, are tied to a specific infrastructure. In this paper, we describe the design and implementation of a SAGA-based Pilot-Job, which supports a wide range of application types, and is usable over a broad range of infrastructures, i.e., it is general-purpose and extensible, and as we will argue is also interoperable with Clouds. We discuss how the SAGA-based Pilot-Job is used for different application types and supports the concurrent usage across multiple heterogeneous distributed infrastructure, including concurrent usage across Clouds and traditional Grids/Clusters. Further, we show how Pilot-Jobs can help to support dynamic execution models and thus, introduce new opportunities for distributed applications. We also demonstrate for the first time that we are aware of, the use of multiple Pilot-Job implementations to solve the same problem; specifically, we use the SAGA-based Pilot-Job on high-end resources such as the TeraGrid and the native Condor Pilot-Job (Glide-in) on Condor resources. Importantly both are invoked via the same interface without changes at the development or deployment level, but only an execution (run-time) decision.",
author = "Andr{\'e} Luckow and Lukasz Lacinski and Shantenu Jha",
year = "2010",
month = "7",
day = "30",
doi = "https://doi.org/10.1109/CCGRID.2010.91",
language = "English (US)",
isbn = "9781424469871",
pages = "135--144",
booktitle = "CCGrid 2010 - 10th IEEE/ACM International Conference on Cluster, Cloud, and Grid Computing",

}

Luckow, A, Lacinski, L & Jha, S 2010, SAGA BigJob: An extensible and interoperable Pilot-Job abstraction for distributed applications and systems. in CCGrid 2010 - 10th IEEE/ACM International Conference on Cluster, Cloud, and Grid Computing., 5493486, pp. 135-144, 10th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2010, Melbourne, VIC, Australia, 5/17/10. https://doi.org/10.1109/CCGRID.2010.91

SAGA BigJob : An extensible and interoperable Pilot-Job abstraction for distributed applications and systems. / Luckow, André; Lacinski, Lukasz; Jha, Shantenu.

CCGrid 2010 - 10th IEEE/ACM International Conference on Cluster, Cloud, and Grid Computing. 2010. p. 135-144 5493486.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - SAGA BigJob

T2 - An extensible and interoperable Pilot-Job abstraction for distributed applications and systems

AU - Luckow, André

AU - Lacinski, Lukasz

AU - Jha, Shantenu

PY - 2010/7/30

Y1 - 2010/7/30

N2 - The uptake of distributed infrastructures by scientific applications has been limited by the availability of extensible, pervasive and simple-to-use abstractions which are required at multiple levels - development, deployment and execution stages of scientific applications. The Pilot-Job abstraction has been shown to be an effective abstraction to address many requirements of scientific applications. Specifically, Pilot-Jobs support the decoupling of workload submission from resource assignment; this results in a flexible execution model, which in turn enables the distributed scale-out of applications on multiple and possibly heterogeneous resources. Most Pilot-Job implementations however, are tied to a specific infrastructure. In this paper, we describe the design and implementation of a SAGA-based Pilot-Job, which supports a wide range of application types, and is usable over a broad range of infrastructures, i.e., it is general-purpose and extensible, and as we will argue is also interoperable with Clouds. We discuss how the SAGA-based Pilot-Job is used for different application types and supports the concurrent usage across multiple heterogeneous distributed infrastructure, including concurrent usage across Clouds and traditional Grids/Clusters. Further, we show how Pilot-Jobs can help to support dynamic execution models and thus, introduce new opportunities for distributed applications. We also demonstrate for the first time that we are aware of, the use of multiple Pilot-Job implementations to solve the same problem; specifically, we use the SAGA-based Pilot-Job on high-end resources such as the TeraGrid and the native Condor Pilot-Job (Glide-in) on Condor resources. Importantly both are invoked via the same interface without changes at the development or deployment level, but only an execution (run-time) decision.

AB - The uptake of distributed infrastructures by scientific applications has been limited by the availability of extensible, pervasive and simple-to-use abstractions which are required at multiple levels - development, deployment and execution stages of scientific applications. The Pilot-Job abstraction has been shown to be an effective abstraction to address many requirements of scientific applications. Specifically, Pilot-Jobs support the decoupling of workload submission from resource assignment; this results in a flexible execution model, which in turn enables the distributed scale-out of applications on multiple and possibly heterogeneous resources. Most Pilot-Job implementations however, are tied to a specific infrastructure. In this paper, we describe the design and implementation of a SAGA-based Pilot-Job, which supports a wide range of application types, and is usable over a broad range of infrastructures, i.e., it is general-purpose and extensible, and as we will argue is also interoperable with Clouds. We discuss how the SAGA-based Pilot-Job is used for different application types and supports the concurrent usage across multiple heterogeneous distributed infrastructure, including concurrent usage across Clouds and traditional Grids/Clusters. Further, we show how Pilot-Jobs can help to support dynamic execution models and thus, introduce new opportunities for distributed applications. We also demonstrate for the first time that we are aware of, the use of multiple Pilot-Job implementations to solve the same problem; specifically, we use the SAGA-based Pilot-Job on high-end resources such as the TeraGrid and the native Condor Pilot-Job (Glide-in) on Condor resources. Importantly both are invoked via the same interface without changes at the development or deployment level, but only an execution (run-time) decision.

UR - http://www.scopus.com/inward/record.url?scp=77954888698&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77954888698&partnerID=8YFLogxK

U2 - https://doi.org/10.1109/CCGRID.2010.91

DO - https://doi.org/10.1109/CCGRID.2010.91

M3 - Conference contribution

SN - 9781424469871

SP - 135

EP - 144

BT - CCGrid 2010 - 10th IEEE/ACM International Conference on Cluster, Cloud, and Grid Computing

ER -

Luckow A, Lacinski L, Jha S. SAGA BigJob: An extensible and interoperable Pilot-Job abstraction for distributed applications and systems. In CCGrid 2010 - 10th IEEE/ACM International Conference on Cluster, Cloud, and Grid Computing. 2010. p. 135-144. 5493486 https://doi.org/10.1109/CCGRID.2010.91