TY - GEN
T1 - Failure prediction in IBM BlueGene/L event logs
AU - Liang, Yinglung
AU - Zhang, Yanyong
AU - Xiong, Hui
AU - Sahoo, Ramendra
PY - 2007
Y1 - 2007
N2 - Frequent failures are becoming a serious concern to the community of high-end computing, especially when the applications and the underlying systems rapidly grow in size and complexity. In order to develop effective fault-tolerant strategies, there is a critical need to predict failure events. To this end, we have collected detailed event logs from IBM BlueGene/L, which has 128K processors, and is currently the fastest supercomputer in the world. In this study, we first show how the event records can be converted into a data set that is appropriate for running classification techniques. Then we apply classifiers on the data, including RIPPER (a rule-based classifier), Support Vector Machines (SVMs), a traditional Nearest Neighbor method, and a customized Nearest Neighbor method. We show that the customized nearest neighbor approach can outperform RIPPER and SVMs in terms of both coverage and precision. The results suggest that the customized nearest neighbor approach can be used to alleviate the impact of failures.
AB - Frequent failures are becoming a serious concern to the community of high-end computing, especially when the applications and the underlying systems rapidly grow in size and complexity. In order to develop effective fault-tolerant strategies, there is a critical need to predict failure events. To this end, we have collected detailed event logs from IBM BlueGene/L, which has 128K processors, and is currently the fastest supercomputer in the world. In this study, we first show how the event records can be converted into a data set that is appropriate for running classification techniques. Then we apply classifiers on the data, including RIPPER (a rule-based classifier), Support Vector Machines (SVMs), a traditional Nearest Neighbor method, and a customized Nearest Neighbor method. We show that the customized nearest neighbor approach can outperform RIPPER and SVMs in terms of both coverage and precision. The results suggest that the customized nearest neighbor approach can be used to alleviate the impact of failures.
UR - http://www.scopus.com/inward/record.url?scp=49749107565&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=49749107565&partnerID=8YFLogxK
U2 - 10.1109/ICDM.2007.46
DO - 10.1109/ICDM.2007.46
M3 - Conference contribution
SN - 0769530184
SN - 9780769530185
T3 - Proceedings - IEEE International Conference on Data Mining, ICDM
SP - 583
EP - 588
BT - Proceedings of the 7th IEEE International Conference on Data Mining, ICDM 2007
T2 - 7th IEEE International Conference on Data Mining, ICDM 2007
Y2 - 28 October 2007 through 31 October 2007
ER -