CoFlux

Robustly correlating KPIs by fluctuations for service troubleshooting

Ya Su, Youjian Zhao, Wentao Xia, Rong Liu, Jiahao Bu, Jing Zhu, Yuanpu Cao, Haibin Li, Chenhao Niu, Yiyin Zhang, Zhaogang Wang, Dan Pei

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Internet-based service companies monitor a large number of KPIs (Key Performance Indicators) to ensure their service quality and reliability. Correlating KPIs by fluctuations reveals interactions between KPIs under anomalous situations and can be extremely useful for service troubleshooting. However, such a KPI fluxcorrelation has been little studied so far in the domain of Internet service operations management. A major challenge is how to automatically and accurately separate fluctuations from normal variations in KPIs with different structural characteristics (such as seasonal, trend and stationary) for a large number of KPIs. In this paper, we propose CoFlux, an unsupervised approach, to automatically (without manual selection of algorithm fitting and parameter tuning) determine whether two KPIs are correlated by fluctuations, in what temporal order they fluctuate, and whether they fluctuate in the same direction. CoFlux's robust feature engineering and robust correlation score computation enable it to work well against the diverse KPI characteristics. Our extensive experiments have demonstrated that CoFlux achieves the best F1- Scores of 0.84 (0.90), 0.92 (0.95), 0.95 (0.99), in answering these three questions, in the two real datasets from a top global Internet company, respectively. Moreover, we showed that CoFlux is effective in assisting service troubleshooting through the applications of alert compression, recommending Top N causes, and constructing fluctuation propagation chains.

Original languageEnglish (US)
Title of host publicationProceedings of the International Symposium on Quality of Service, IWQoS 2019
PublisherAssociation for Computing Machinery, Inc
ISBN (Electronic)9781450367783
DOIs
StatePublished - Jun 24 2019
Event2019 International Symposium on Quality of Service, IWQoS 2019 - Phoenix, United States
Duration: Jun 24 2019Jun 25 2019

Publication series

NameProceedings of the International Symposium on Quality of Service, IWQoS 2019

Conference

Conference2019 International Symposium on Quality of Service, IWQoS 2019
CountryUnited States
CityPhoenix
Period6/24/196/25/19

Fingerprint

Internet
Industry
Tuning
Fluctuations
Key performance indicators
Experiments
World Wide Web

All Science Journal Classification (ASJC) codes

  • Safety, Risk, Reliability and Quality
  • Management of Technology and Innovation
  • Computer Networks and Communications
  • Media Technology

Cite this

Su, Y., Zhao, Y., Xia, W., Liu, R., Bu, J., Zhu, J., ... Pei, D. (2019). CoFlux: Robustly correlating KPIs by fluctuations for service troubleshooting. In Proceedings of the International Symposium on Quality of Service, IWQoS 2019 [13] (Proceedings of the International Symposium on Quality of Service, IWQoS 2019). Association for Computing Machinery, Inc. https://doi.org/10.1145/3326285.3329048
Su, Ya ; Zhao, Youjian ; Xia, Wentao ; Liu, Rong ; Bu, Jiahao ; Zhu, Jing ; Cao, Yuanpu ; Li, Haibin ; Niu, Chenhao ; Zhang, Yiyin ; Wang, Zhaogang ; Pei, Dan. / CoFlux : Robustly correlating KPIs by fluctuations for service troubleshooting. Proceedings of the International Symposium on Quality of Service, IWQoS 2019. Association for Computing Machinery, Inc, 2019. (Proceedings of the International Symposium on Quality of Service, IWQoS 2019).
@inproceedings{23be4a0d1f614dba8622e62c10e86735,
title = "CoFlux: Robustly correlating KPIs by fluctuations for service troubleshooting",
abstract = "Internet-based service companies monitor a large number of KPIs (Key Performance Indicators) to ensure their service quality and reliability. Correlating KPIs by fluctuations reveals interactions between KPIs under anomalous situations and can be extremely useful for service troubleshooting. However, such a KPI fluxcorrelation has been little studied so far in the domain of Internet service operations management. A major challenge is how to automatically and accurately separate fluctuations from normal variations in KPIs with different structural characteristics (such as seasonal, trend and stationary) for a large number of KPIs. In this paper, we propose CoFlux, an unsupervised approach, to automatically (without manual selection of algorithm fitting and parameter tuning) determine whether two KPIs are correlated by fluctuations, in what temporal order they fluctuate, and whether they fluctuate in the same direction. CoFlux's robust feature engineering and robust correlation score computation enable it to work well against the diverse KPI characteristics. Our extensive experiments have demonstrated that CoFlux achieves the best F1- Scores of 0.84 (0.90), 0.92 (0.95), 0.95 (0.99), in answering these three questions, in the two real datasets from a top global Internet company, respectively. Moreover, we showed that CoFlux is effective in assisting service troubleshooting through the applications of alert compression, recommending Top N causes, and constructing fluctuation propagation chains.",
author = "Ya Su and Youjian Zhao and Wentao Xia and Rong Liu and Jiahao Bu and Jing Zhu and Yuanpu Cao and Haibin Li and Chenhao Niu and Yiyin Zhang and Zhaogang Wang and Dan Pei",
year = "2019",
month = "6",
day = "24",
doi = "https://doi.org/10.1145/3326285.3329048",
language = "English (US)",
series = "Proceedings of the International Symposium on Quality of Service, IWQoS 2019",
publisher = "Association for Computing Machinery, Inc",
booktitle = "Proceedings of the International Symposium on Quality of Service, IWQoS 2019",

}

Su, Y, Zhao, Y, Xia, W, Liu, R, Bu, J, Zhu, J, Cao, Y, Li, H, Niu, C, Zhang, Y, Wang, Z & Pei, D 2019, CoFlux: Robustly correlating KPIs by fluctuations for service troubleshooting. in Proceedings of the International Symposium on Quality of Service, IWQoS 2019., 13, Proceedings of the International Symposium on Quality of Service, IWQoS 2019, Association for Computing Machinery, Inc, 2019 International Symposium on Quality of Service, IWQoS 2019, Phoenix, United States, 6/24/19. https://doi.org/10.1145/3326285.3329048

CoFlux : Robustly correlating KPIs by fluctuations for service troubleshooting. / Su, Ya; Zhao, Youjian; Xia, Wentao; Liu, Rong; Bu, Jiahao; Zhu, Jing; Cao, Yuanpu; Li, Haibin; Niu, Chenhao; Zhang, Yiyin; Wang, Zhaogang; Pei, Dan.

Proceedings of the International Symposium on Quality of Service, IWQoS 2019. Association for Computing Machinery, Inc, 2019. 13 (Proceedings of the International Symposium on Quality of Service, IWQoS 2019).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - CoFlux

T2 - Robustly correlating KPIs by fluctuations for service troubleshooting

AU - Su, Ya

AU - Zhao, Youjian

AU - Xia, Wentao

AU - Liu, Rong

AU - Bu, Jiahao

AU - Zhu, Jing

AU - Cao, Yuanpu

AU - Li, Haibin

AU - Niu, Chenhao

AU - Zhang, Yiyin

AU - Wang, Zhaogang

AU - Pei, Dan

PY - 2019/6/24

Y1 - 2019/6/24

N2 - Internet-based service companies monitor a large number of KPIs (Key Performance Indicators) to ensure their service quality and reliability. Correlating KPIs by fluctuations reveals interactions between KPIs under anomalous situations and can be extremely useful for service troubleshooting. However, such a KPI fluxcorrelation has been little studied so far in the domain of Internet service operations management. A major challenge is how to automatically and accurately separate fluctuations from normal variations in KPIs with different structural characteristics (such as seasonal, trend and stationary) for a large number of KPIs. In this paper, we propose CoFlux, an unsupervised approach, to automatically (without manual selection of algorithm fitting and parameter tuning) determine whether two KPIs are correlated by fluctuations, in what temporal order they fluctuate, and whether they fluctuate in the same direction. CoFlux's robust feature engineering and robust correlation score computation enable it to work well against the diverse KPI characteristics. Our extensive experiments have demonstrated that CoFlux achieves the best F1- Scores of 0.84 (0.90), 0.92 (0.95), 0.95 (0.99), in answering these three questions, in the two real datasets from a top global Internet company, respectively. Moreover, we showed that CoFlux is effective in assisting service troubleshooting through the applications of alert compression, recommending Top N causes, and constructing fluctuation propagation chains.

AB - Internet-based service companies monitor a large number of KPIs (Key Performance Indicators) to ensure their service quality and reliability. Correlating KPIs by fluctuations reveals interactions between KPIs under anomalous situations and can be extremely useful for service troubleshooting. However, such a KPI fluxcorrelation has been little studied so far in the domain of Internet service operations management. A major challenge is how to automatically and accurately separate fluctuations from normal variations in KPIs with different structural characteristics (such as seasonal, trend and stationary) for a large number of KPIs. In this paper, we propose CoFlux, an unsupervised approach, to automatically (without manual selection of algorithm fitting and parameter tuning) determine whether two KPIs are correlated by fluctuations, in what temporal order they fluctuate, and whether they fluctuate in the same direction. CoFlux's robust feature engineering and robust correlation score computation enable it to work well against the diverse KPI characteristics. Our extensive experiments have demonstrated that CoFlux achieves the best F1- Scores of 0.84 (0.90), 0.92 (0.95), 0.95 (0.99), in answering these three questions, in the two real datasets from a top global Internet company, respectively. Moreover, we showed that CoFlux is effective in assisting service troubleshooting through the applications of alert compression, recommending Top N causes, and constructing fluctuation propagation chains.

UR - http://www.scopus.com/inward/record.url?scp=85069170467&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85069170467&partnerID=8YFLogxK

U2 - https://doi.org/10.1145/3326285.3329048

DO - https://doi.org/10.1145/3326285.3329048

M3 - Conference contribution

T3 - Proceedings of the International Symposium on Quality of Service, IWQoS 2019

BT - Proceedings of the International Symposium on Quality of Service, IWQoS 2019

PB - Association for Computing Machinery, Inc

ER -

Su Y, Zhao Y, Xia W, Liu R, Bu J, Zhu J et al. CoFlux: Robustly correlating KPIs by fluctuations for service troubleshooting. In Proceedings of the International Symposium on Quality of Service, IWQoS 2019. Association for Computing Machinery, Inc. 2019. 13. (Proceedings of the International Symposium on Quality of Service, IWQoS 2019). https://doi.org/10.1145/3326285.3329048