Intelligent Scheduling for Parallel Jobs in Big Data Processing Systems

Mingrui Xu, Chase Wu, Aiqin Hou, Yongqiang Wang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

The explosive growth of data in various scientific, industrial, and business domains necessitates the use of big data processing systems, such as Hadoop, which are typically deployed in a physical or cloud-based cluster shared by many users running parallel jobs. As the user population and application scale increase, such systems are expanded from time to time with an addition of new nodes of different types, making the cluster highly heterogeneous. Job scheduling in such systems largely determines the performance of big data applications and remains to be a challenging problem. In this paper, we formulate a generic job scheduling problem for parallel processing of big data in heterogeneous clusters and design a k-means based task scheduling algorithm, referred to as KMTS. Simulation results show that KMTS improves execution performance by 25% and 30% on average in single job scheduling and parallel job scheduling, respectively, over existing methods. The performance superiority is also confirmed by real experiments in high-performance computing environments.

Original languageEnglish (US)
Title of host publication2019 International Conference on Computing, Networking and Communications, ICNC 2019
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages22-28
Number of pages7
ISBN (Electronic)9781538692233
DOIs
StatePublished - Apr 8 2019
Event2019 International Conference on Computing, Networking and Communications, ICNC 2019 - Honolulu, United States
Duration: Feb 18 2019Feb 21 2019

Publication series

Name2019 International Conference on Computing, Networking and Communications, ICNC 2019

Conference

Conference2019 International Conference on Computing, Networking and Communications, ICNC 2019
CountryUnited States
CityHonolulu
Period2/18/192/21/19

Fingerprint

Scheduling
Scheduling algorithms
Big data
Processing
Industry
Experiments

All Science Journal Classification (ASJC) codes

  • Software
  • Hardware and Architecture
  • Computer Networks and Communications

Keywords

  • Task scheduling
  • big data platform
  • cluster manager
  • heterogeneous clusters

Cite this

Xu, M., Wu, C., Hou, A., & Wang, Y. (2019). Intelligent Scheduling for Parallel Jobs in Big Data Processing Systems. In 2019 International Conference on Computing, Networking and Communications, ICNC 2019 (pp. 22-28). [8685520] (2019 International Conference on Computing, Networking and Communications, ICNC 2019). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICCNC.2019.8685520
Xu, Mingrui ; Wu, Chase ; Hou, Aiqin ; Wang, Yongqiang. / Intelligent Scheduling for Parallel Jobs in Big Data Processing Systems. 2019 International Conference on Computing, Networking and Communications, ICNC 2019. Institute of Electrical and Electronics Engineers Inc., 2019. pp. 22-28 (2019 International Conference on Computing, Networking and Communications, ICNC 2019).
@inproceedings{ce17748f07d14ae6917b1c29c5b3b258,
title = "Intelligent Scheduling for Parallel Jobs in Big Data Processing Systems",
abstract = "The explosive growth of data in various scientific, industrial, and business domains necessitates the use of big data processing systems, such as Hadoop, which are typically deployed in a physical or cloud-based cluster shared by many users running parallel jobs. As the user population and application scale increase, such systems are expanded from time to time with an addition of new nodes of different types, making the cluster highly heterogeneous. Job scheduling in such systems largely determines the performance of big data applications and remains to be a challenging problem. In this paper, we formulate a generic job scheduling problem for parallel processing of big data in heterogeneous clusters and design a k-means based task scheduling algorithm, referred to as KMTS. Simulation results show that KMTS improves execution performance by 25{\%} and 30{\%} on average in single job scheduling and parallel job scheduling, respectively, over existing methods. The performance superiority is also confirmed by real experiments in high-performance computing environments.",
keywords = "Task scheduling, big data platform, cluster manager, heterogeneous clusters",
author = "Mingrui Xu and Chase Wu and Aiqin Hou and Yongqiang Wang",
year = "2019",
month = "4",
day = "8",
doi = "https://doi.org/10.1109/ICCNC.2019.8685520",
language = "English (US)",
series = "2019 International Conference on Computing, Networking and Communications, ICNC 2019",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
pages = "22--28",
booktitle = "2019 International Conference on Computing, Networking and Communications, ICNC 2019",
address = "United States",

}

Xu, M, Wu, C, Hou, A & Wang, Y 2019, Intelligent Scheduling for Parallel Jobs in Big Data Processing Systems. in 2019 International Conference on Computing, Networking and Communications, ICNC 2019., 8685520, 2019 International Conference on Computing, Networking and Communications, ICNC 2019, Institute of Electrical and Electronics Engineers Inc., pp. 22-28, 2019 International Conference on Computing, Networking and Communications, ICNC 2019, Honolulu, United States, 2/18/19. https://doi.org/10.1109/ICCNC.2019.8685520

Intelligent Scheduling for Parallel Jobs in Big Data Processing Systems. / Xu, Mingrui; Wu, Chase; Hou, Aiqin; Wang, Yongqiang.

2019 International Conference on Computing, Networking and Communications, ICNC 2019. Institute of Electrical and Electronics Engineers Inc., 2019. p. 22-28 8685520 (2019 International Conference on Computing, Networking and Communications, ICNC 2019).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Intelligent Scheduling for Parallel Jobs in Big Data Processing Systems

AU - Xu, Mingrui

AU - Wu, Chase

AU - Hou, Aiqin

AU - Wang, Yongqiang

PY - 2019/4/8

Y1 - 2019/4/8

N2 - The explosive growth of data in various scientific, industrial, and business domains necessitates the use of big data processing systems, such as Hadoop, which are typically deployed in a physical or cloud-based cluster shared by many users running parallel jobs. As the user population and application scale increase, such systems are expanded from time to time with an addition of new nodes of different types, making the cluster highly heterogeneous. Job scheduling in such systems largely determines the performance of big data applications and remains to be a challenging problem. In this paper, we formulate a generic job scheduling problem for parallel processing of big data in heterogeneous clusters and design a k-means based task scheduling algorithm, referred to as KMTS. Simulation results show that KMTS improves execution performance by 25% and 30% on average in single job scheduling and parallel job scheduling, respectively, over existing methods. The performance superiority is also confirmed by real experiments in high-performance computing environments.

AB - The explosive growth of data in various scientific, industrial, and business domains necessitates the use of big data processing systems, such as Hadoop, which are typically deployed in a physical or cloud-based cluster shared by many users running parallel jobs. As the user population and application scale increase, such systems are expanded from time to time with an addition of new nodes of different types, making the cluster highly heterogeneous. Job scheduling in such systems largely determines the performance of big data applications and remains to be a challenging problem. In this paper, we formulate a generic job scheduling problem for parallel processing of big data in heterogeneous clusters and design a k-means based task scheduling algorithm, referred to as KMTS. Simulation results show that KMTS improves execution performance by 25% and 30% on average in single job scheduling and parallel job scheduling, respectively, over existing methods. The performance superiority is also confirmed by real experiments in high-performance computing environments.

KW - Task scheduling

KW - big data platform

KW - cluster manager

KW - heterogeneous clusters

UR - http://www.scopus.com/inward/record.url?scp=85064975553&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85064975553&partnerID=8YFLogxK

U2 - https://doi.org/10.1109/ICCNC.2019.8685520

DO - https://doi.org/10.1109/ICCNC.2019.8685520

M3 - Conference contribution

T3 - 2019 International Conference on Computing, Networking and Communications, ICNC 2019

SP - 22

EP - 28

BT - 2019 International Conference on Computing, Networking and Communications, ICNC 2019

PB - Institute of Electrical and Electronics Engineers Inc.

ER -

Xu M, Wu C, Hou A, Wang Y. Intelligent Scheduling for Parallel Jobs in Big Data Processing Systems. In 2019 International Conference on Computing, Networking and Communications, ICNC 2019. Institute of Electrical and Electronics Engineers Inc. 2019. p. 22-28. 8685520. (2019 International Conference on Computing, Networking and Communications, ICNC 2019). https://doi.org/10.1109/ICCNC.2019.8685520