On a Dynamic Data Placement Strategy for Heterogeneous Hadoop Clusters

Yang Liu, Chase Wu, Meng Wang, Aiqin Hou, Yongqiang Wang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Hadoop is one of the most popular distributed systems for big data computing in both industry and science communities. The default data placement strategy of Hadoop Distributed File System (HDFS), which was initially designed for homogenous environments, may suffer from performance degradation when deployed in heterogeneous clusters comprised of data nodes with disparate computing power and disk capacity, hence undermining the performance of MapReduce applications. In this paper, we use a Grey Forecast model to predict data hotness dynamically and determine an appropriate number of data block replicas on the fly. Based on such information, we further propose a dynamic data placement strategy (DDPS) to decide the best location for new replicas according to their hotness. The proposed method is able to dynamically adjust data replicas stored on each node in a heterogeneous Hadoop cluster and reduce the response time of big data applications. Experimental results on a heterogeneous Hadoop cluster show that DDPS together with the prediction model significantly increases application execution efficiency and improve MapReduce performance over the default HDFS configuration.

Original languageEnglish (US)
Title of host publication2018 International Symposium on Networks, Computers and Communications, ISNCC 2018
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781538637784
DOIs
StatePublished - Nov 9 2018
Event2018 International Symposium on Networks, Computers and Communications, ISNCC 2018 - Rome, Italy
Duration: Jun 19 2018Jun 21 2018

Publication series

Name2018 International Symposium on Networks, Computers and Communications, ISNCC 2018

Other

Other2018 International Symposium on Networks, Computers and Communications, ISNCC 2018
CountryItaly
CityRome
Period6/19/186/21/18

Fingerprint

Degradation
Industry
Big data

All Science Journal Classification (ASJC) codes

  • Safety, Risk, Reliability and Quality
  • Energy Engineering and Power Technology
  • Hardware and Architecture
  • Computer Networks and Communications
  • Computational Theory and Mathematics

Cite this

Liu, Y., Wu, C., Wang, M., Hou, A., & Wang, Y. (2018). On a Dynamic Data Placement Strategy for Heterogeneous Hadoop Clusters. In 2018 International Symposium on Networks, Computers and Communications, ISNCC 2018 [8530970] (2018 International Symposium on Networks, Computers and Communications, ISNCC 2018). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ISNCC.2018.8530970
Liu, Yang ; Wu, Chase ; Wang, Meng ; Hou, Aiqin ; Wang, Yongqiang. / On a Dynamic Data Placement Strategy for Heterogeneous Hadoop Clusters. 2018 International Symposium on Networks, Computers and Communications, ISNCC 2018. Institute of Electrical and Electronics Engineers Inc., 2018. (2018 International Symposium on Networks, Computers and Communications, ISNCC 2018).
@inproceedings{7a5168fba416444eab621d45affcdd37,
title = "On a Dynamic Data Placement Strategy for Heterogeneous Hadoop Clusters",
abstract = "Hadoop is one of the most popular distributed systems for big data computing in both industry and science communities. The default data placement strategy of Hadoop Distributed File System (HDFS), which was initially designed for homogenous environments, may suffer from performance degradation when deployed in heterogeneous clusters comprised of data nodes with disparate computing power and disk capacity, hence undermining the performance of MapReduce applications. In this paper, we use a Grey Forecast model to predict data hotness dynamically and determine an appropriate number of data block replicas on the fly. Based on such information, we further propose a dynamic data placement strategy (DDPS) to decide the best location for new replicas according to their hotness. The proposed method is able to dynamically adjust data replicas stored on each node in a heterogeneous Hadoop cluster and reduce the response time of big data applications. Experimental results on a heterogeneous Hadoop cluster show that DDPS together with the prediction model significantly increases application execution efficiency and improve MapReduce performance over the default HDFS configuration.",
author = "Yang Liu and Chase Wu and Meng Wang and Aiqin Hou and Yongqiang Wang",
year = "2018",
month = "11",
day = "9",
doi = "https://doi.org/10.1109/ISNCC.2018.8530970",
language = "English (US)",
series = "2018 International Symposium on Networks, Computers and Communications, ISNCC 2018",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
booktitle = "2018 International Symposium on Networks, Computers and Communications, ISNCC 2018",
address = "United States",

}

Liu, Y, Wu, C, Wang, M, Hou, A & Wang, Y 2018, On a Dynamic Data Placement Strategy for Heterogeneous Hadoop Clusters. in 2018 International Symposium on Networks, Computers and Communications, ISNCC 2018., 8530970, 2018 International Symposium on Networks, Computers and Communications, ISNCC 2018, Institute of Electrical and Electronics Engineers Inc., 2018 International Symposium on Networks, Computers and Communications, ISNCC 2018, Rome, Italy, 6/19/18. https://doi.org/10.1109/ISNCC.2018.8530970

On a Dynamic Data Placement Strategy for Heterogeneous Hadoop Clusters. / Liu, Yang; Wu, Chase; Wang, Meng; Hou, Aiqin; Wang, Yongqiang.

2018 International Symposium on Networks, Computers and Communications, ISNCC 2018. Institute of Electrical and Electronics Engineers Inc., 2018. 8530970 (2018 International Symposium on Networks, Computers and Communications, ISNCC 2018).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - On a Dynamic Data Placement Strategy for Heterogeneous Hadoop Clusters

AU - Liu, Yang

AU - Wu, Chase

AU - Wang, Meng

AU - Hou, Aiqin

AU - Wang, Yongqiang

PY - 2018/11/9

Y1 - 2018/11/9

N2 - Hadoop is one of the most popular distributed systems for big data computing in both industry and science communities. The default data placement strategy of Hadoop Distributed File System (HDFS), which was initially designed for homogenous environments, may suffer from performance degradation when deployed in heterogeneous clusters comprised of data nodes with disparate computing power and disk capacity, hence undermining the performance of MapReduce applications. In this paper, we use a Grey Forecast model to predict data hotness dynamically and determine an appropriate number of data block replicas on the fly. Based on such information, we further propose a dynamic data placement strategy (DDPS) to decide the best location for new replicas according to their hotness. The proposed method is able to dynamically adjust data replicas stored on each node in a heterogeneous Hadoop cluster and reduce the response time of big data applications. Experimental results on a heterogeneous Hadoop cluster show that DDPS together with the prediction model significantly increases application execution efficiency and improve MapReduce performance over the default HDFS configuration.

AB - Hadoop is one of the most popular distributed systems for big data computing in both industry and science communities. The default data placement strategy of Hadoop Distributed File System (HDFS), which was initially designed for homogenous environments, may suffer from performance degradation when deployed in heterogeneous clusters comprised of data nodes with disparate computing power and disk capacity, hence undermining the performance of MapReduce applications. In this paper, we use a Grey Forecast model to predict data hotness dynamically and determine an appropriate number of data block replicas on the fly. Based on such information, we further propose a dynamic data placement strategy (DDPS) to decide the best location for new replicas according to their hotness. The proposed method is able to dynamically adjust data replicas stored on each node in a heterogeneous Hadoop cluster and reduce the response time of big data applications. Experimental results on a heterogeneous Hadoop cluster show that DDPS together with the prediction model significantly increases application execution efficiency and improve MapReduce performance over the default HDFS configuration.

UR - http://www.scopus.com/inward/record.url?scp=85058473721&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85058473721&partnerID=8YFLogxK

U2 - https://doi.org/10.1109/ISNCC.2018.8530970

DO - https://doi.org/10.1109/ISNCC.2018.8530970

M3 - Conference contribution

T3 - 2018 International Symposium on Networks, Computers and Communications, ISNCC 2018

BT - 2018 International Symposium on Networks, Computers and Communications, ISNCC 2018

PB - Institute of Electrical and Electronics Engineers Inc.

ER -

Liu Y, Wu C, Wang M, Hou A, Wang Y. On a Dynamic Data Placement Strategy for Heterogeneous Hadoop Clusters. In 2018 International Symposium on Networks, Computers and Communications, ISNCC 2018. Institute of Electrical and Electronics Engineers Inc. 2018. 8530970. (2018 International Symposium on Networks, Computers and Communications, ISNCC 2018). https://doi.org/10.1109/ISNCC.2018.8530970