TY - JOUR
T1 - Checkpoint evolution for volatile correlation computing
AU - Zhou, Wenjun
AU - Xiong, Hui
N1 - Funding Information: Acknowledgements This research was partially supported by the National Science Foundation (NSF) via grant number CNS 0831186, the Rutgers CCC Green Computing Initiative, and the National Natural Science Foundation of China (70890080).
PY - 2011/4
Y1 - 2011/4
N2 - Given a set of data objects, the problem of correlation computing is concerned with efficient identification of strongly-related ones. Existing studies have been mainly focused on static data. However, as observed in many real-world scenarios, input data are often dynamic and analytical results have to be continually updated. Therefore, there is the critical need to develop a dynamic solution for volatile correlation computing. To this end, we develop a checkpoint scheme, which can help us capture dynamic correlation values by establishing an evolving computation buffer. In this paper, we first provide a theoretical analysis of the properties of the volatile correlation, and derive a tight upper bound. Such tight and evolving upper bound is used to identify a small list of candidate pairs, which are maintained as new transactions are added into the database. Once the total number of new transactions goes beyond the buffer size, the upper bound is re-computed according to the next checkpoint, and a new list of candidate pairs is identified. Based on such a scheme, a new algorithm named CHECK-POINT+ has been designed. Experimental results on real-world data sets show that CHECK-POINT+ can significantly reduce the computation cost in dynamic data environments, and has the advantage of compacting the use of memory space.
AB - Given a set of data objects, the problem of correlation computing is concerned with efficient identification of strongly-related ones. Existing studies have been mainly focused on static data. However, as observed in many real-world scenarios, input data are often dynamic and analytical results have to be continually updated. Therefore, there is the critical need to develop a dynamic solution for volatile correlation computing. To this end, we develop a checkpoint scheme, which can help us capture dynamic correlation values by establishing an evolving computation buffer. In this paper, we first provide a theoretical analysis of the properties of the volatile correlation, and derive a tight upper bound. Such tight and evolving upper bound is used to identify a small list of candidate pairs, which are maintained as new transactions are added into the database. Once the total number of new transactions goes beyond the buffer size, the upper bound is re-computed according to the next checkpoint, and a new list of candidate pairs is identified. Based on such a scheme, a new algorithm named CHECK-POINT+ has been designed. Experimental results on real-world data sets show that CHECK-POINT+ can significantly reduce the computation cost in dynamic data environments, and has the advantage of compacting the use of memory space.
KW - Checkpoint
KW - Correlation coefficient
KW - Pearson's correlation coefficient
KW - Volatile correlation computing
UR - http://www.scopus.com/inward/record.url?scp=79953225492&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=79953225492&partnerID=8YFLogxK
U2 - 10.1007/s10994-010-5204-9
DO - 10.1007/s10994-010-5204-9
M3 - Article
SN - 0885-6125
VL - 83
SP - 103
EP - 131
JO - Machine Learning
JF - Machine Learning
IS - 1
ER -