Abstract
The explosive growth of data in various scientific, industrial, and business domains necessitates the use of big data processing systems, such as Hadoop, which are typically deployed in a physical or cloud-based cluster shared by many users running parallel jobs. As the user population and application scale increase, such systems are expanded from time to time with an addition of new nodes of different types, making the cluster highly heterogeneous. Job scheduling in such systems largely determines the performance of big data applications and remains to be a challenging problem. In this paper, we formulate a generic job scheduling problem for parallel processing of big data in heterogeneous clusters and design a k-means based task scheduling algorithm, referred to as KMTS. Simulation results show that KMTS improves execution performance by 25% and 30% on average in single job scheduling and parallel job scheduling, respectively, over existing methods. The performance superiority is also confirmed by real experiments in high-performance computing environments.
Original language | English (US) |
---|---|
Title of host publication | 2019 International Conference on Computing, Networking and Communications, ICNC 2019 |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 22-28 |
Number of pages | 7 |
ISBN (Electronic) | 9781538692233 |
DOIs | |
State | Published - Apr 8 2019 |
Event | 2019 International Conference on Computing, Networking and Communications, ICNC 2019 - Honolulu, United States Duration: Feb 18 2019 → Feb 21 2019 |
Publication series
Name | 2019 International Conference on Computing, Networking and Communications, ICNC 2019 |
---|
Conference
Conference | 2019 International Conference on Computing, Networking and Communications, ICNC 2019 |
---|---|
Country | United States |
City | Honolulu |
Period | 2/18/19 → 2/21/19 |
Fingerprint
All Science Journal Classification (ASJC) codes
- Software
- Hardware and Architecture
- Computer Networks and Communications
Keywords
- Task scheduling
- big data platform
- cluster manager
- heterogeneous clusters
Cite this
}
Intelligent Scheduling for Parallel Jobs in Big Data Processing Systems. / Xu, Mingrui; Wu, Chase; Hou, Aiqin; Wang, Yongqiang.
2019 International Conference on Computing, Networking and Communications, ICNC 2019. Institute of Electrical and Electronics Engineers Inc., 2019. p. 22-28 8685520 (2019 International Conference on Computing, Networking and Communications, ICNC 2019).Research output: Chapter in Book/Report/Conference proceeding › Conference contribution
TY - GEN
T1 - Intelligent Scheduling for Parallel Jobs in Big Data Processing Systems
AU - Xu, Mingrui
AU - Wu, Chase
AU - Hou, Aiqin
AU - Wang, Yongqiang
PY - 2019/4/8
Y1 - 2019/4/8
N2 - The explosive growth of data in various scientific, industrial, and business domains necessitates the use of big data processing systems, such as Hadoop, which are typically deployed in a physical or cloud-based cluster shared by many users running parallel jobs. As the user population and application scale increase, such systems are expanded from time to time with an addition of new nodes of different types, making the cluster highly heterogeneous. Job scheduling in such systems largely determines the performance of big data applications and remains to be a challenging problem. In this paper, we formulate a generic job scheduling problem for parallel processing of big data in heterogeneous clusters and design a k-means based task scheduling algorithm, referred to as KMTS. Simulation results show that KMTS improves execution performance by 25% and 30% on average in single job scheduling and parallel job scheduling, respectively, over existing methods. The performance superiority is also confirmed by real experiments in high-performance computing environments.
AB - The explosive growth of data in various scientific, industrial, and business domains necessitates the use of big data processing systems, such as Hadoop, which are typically deployed in a physical or cloud-based cluster shared by many users running parallel jobs. As the user population and application scale increase, such systems are expanded from time to time with an addition of new nodes of different types, making the cluster highly heterogeneous. Job scheduling in such systems largely determines the performance of big data applications and remains to be a challenging problem. In this paper, we formulate a generic job scheduling problem for parallel processing of big data in heterogeneous clusters and design a k-means based task scheduling algorithm, referred to as KMTS. Simulation results show that KMTS improves execution performance by 25% and 30% on average in single job scheduling and parallel job scheduling, respectively, over existing methods. The performance superiority is also confirmed by real experiments in high-performance computing environments.
KW - Task scheduling
KW - big data platform
KW - cluster manager
KW - heterogeneous clusters
UR - http://www.scopus.com/inward/record.url?scp=85064975553&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85064975553&partnerID=8YFLogxK
U2 - https://doi.org/10.1109/ICCNC.2019.8685520
DO - https://doi.org/10.1109/ICCNC.2019.8685520
M3 - Conference contribution
T3 - 2019 International Conference on Computing, Networking and Communications, ICNC 2019
SP - 22
EP - 28
BT - 2019 International Conference on Computing, Networking and Communications, ICNC 2019
PB - Institute of Electrical and Electronics Engineers Inc.
ER -