Csr: Small: Collaborative Research: An Integrated Approach To Performance Modeling And Optimization Of Big-Data Scientific Workflows

Description

Next-generation e-science is producing colossal amounts of data, commonly known as Big Data, on the order of terabyte at present and petabyte or even exabyte in the predictable future. These scientific applications typically feature data- and network-intensive workflows comprised of computing modules with intricate inter-module dependencies. Application users oftentimes need to manually configure their computing workflows in distributed environments in an ad-hoc manner, which significantly limits the productivity of scientists and constrains the utilization of resources.

The end-to-end performance of big data scientific workflows depends on both the mapping scheme that determines module assignment and the scheduling policy that determines resource allocation. These two aspects of a workflow-based research process are traditionally treated as two sEnvironmental Protection Agencyrate topics, and the interactions between them have not been fully explored. As the scale and complexity of scientific workflows and network environments rapidly increase, each individual aspect of performance optimization has limited success. This research is an in-depth investigation into workflow execution dynamics in resource sharing environments to explore the interactions between workflow mapping and node scheduling on a unified application-support platform. The idea is to build a three-layer workflow optimization architecture that seamlessly integrates three interrelated components based on rigorous algorithmic design, theoretical dynamics analysis, and real network implementation, deployment, and evaluation. The successful completion of this project will provide a solid mathematical foundation for the analysis and control of system dynamics of big data scientific workflows, produce a suite of cooperative mapping and scheduling optimization solutions to facilitate scientific collaborations, and add an additional level of intelligence to existing workflow engines widely adopted in the current grid and cloud computing middleware. The resulting workflow optimization solutions will benefit a broad spectrum of workflow-based scientific applications
StatusFinished
Effective start/end date8/28/159/30/17

Funding

  • National Science Foundation

Fingerprint

Scheduling
Grid computing
Cloud computing
Middleware
Dynamic analysis
Resource allocation
Dynamical systems
Productivity
Engines
Big data