TY - JOUR
T1 - Exascale workflow applications and middleware
T2 - An ExaWorks retrospective
AU - Alsaadi, Aymen
AU - Hategan-Marandiuc, Mihael
AU - Maheshwari, Ketan
AU - Merzky, Andre
AU - Titov, Mikhail
AU - Turilli, Matteo
AU - Wilke, Andreas
AU - Wozniak, Justin M.
AU - Chard, Kyle
AU - Ferreira da Silva, Rafael
AU - Jha, Shantenu
AU - Laney, Daniel
N1 - Publisher Copyright: © The Author(s) 2025
PY - 2025/7
Y1 - 2025/7
N2 - Exascale computers offer transformative capabilities to combine data-driven and learning-based approaches with traditional simulation applications to accelerate scientific discovery and insight. However, these software combinations and integrations are difficult to achieve due to the challenges of coordinating and deploying heterogeneous software components on diverse and massive platforms. We present the ExaWorks project, which addresses many of these challenges. We developed a workflow Software Development Toolkit (SDK), a curated collection of workflow technologies that can be composed and interoperated through a common interface, engineered following current best practices, and specifically designed to work on HPC platforms. ExaWorks also developed PSI/J, a job management abstraction API, to simplify the construction of portable software components and applications that can be used over various HPC schedulers. The PSI/J API is a minimal interface for submitting and monitoring jobs and their execution state across multiple and commonly used HPC schedulers. We also describe several leading and innovative workflow examples of ExaWorks tools used on DOE leadership platforms. Furthermore, we discuss how our project is working with the workflow community, large computing facilities, and HPC platform vendors to address the requirements of workflows sustainably at the exascale.
AB - Exascale computers offer transformative capabilities to combine data-driven and learning-based approaches with traditional simulation applications to accelerate scientific discovery and insight. However, these software combinations and integrations are difficult to achieve due to the challenges of coordinating and deploying heterogeneous software components on diverse and massive platforms. We present the ExaWorks project, which addresses many of these challenges. We developed a workflow Software Development Toolkit (SDK), a curated collection of workflow technologies that can be composed and interoperated through a common interface, engineered following current best practices, and specifically designed to work on HPC platforms. ExaWorks also developed PSI/J, a job management abstraction API, to simplify the construction of portable software components and applications that can be used over various HPC schedulers. The PSI/J API is a minimal interface for submitting and monitoring jobs and their execution state across multiple and commonly used HPC schedulers. We also describe several leading and innovative workflow examples of ExaWorks tools used on DOE leadership platforms. Furthermore, we discuss how our project is working with the workflow community, large computing facilities, and HPC platform vendors to address the requirements of workflows sustainably at the exascale.
KW - ECP
KW - Exascale
KW - HPC workflows
KW - SDK
KW - middleware building blocks
KW - workflow applications
KW - workflow community initiative
KW - workflow interoperability
UR - https://www.scopus.com/pages/publications/105003809305
UR - https://www.scopus.com/pages/publications/105003809305#tab=citedBy
U2 - 10.1177/10943420251331674
DO - 10.1177/10943420251331674
M3 - Article
SN - 1094-3420
VL - 39
SP - 579
EP - 593
JO - International Journal of High Performance Computing Applications
JF - International Journal of High Performance Computing Applications
IS - 4
ER -