Checkpoint evolution for volatile correlation computing

Wenjun Zhou, Hui Xiong

Research output: Contribution to journalArticlepeer-review

6 Scopus citations

Abstract

Given a set of data objects, the problem of correlation computing is concerned with efficient identification of strongly-related ones. Existing studies have been mainly focused on static data. However, as observed in many real-world scenarios, input data are often dynamic and analytical results have to be continually updated. Therefore, there is the critical need to develop a dynamic solution for volatile correlation computing. To this end, we develop a checkpoint scheme, which can help us capture dynamic correlation values by establishing an evolving computation buffer. In this paper, we first provide a theoretical analysis of the properties of the volatile correlation, and derive a tight upper bound. Such tight and evolving upper bound is used to identify a small list of candidate pairs, which are maintained as new transactions are added into the database. Once the total number of new transactions goes beyond the buffer size, the upper bound is re-computed according to the next checkpoint, and a new list of candidate pairs is identified. Based on such a scheme, a new algorithm named CHECK-POINT+ has been designed. Experimental results on real-world data sets show that CHECK-POINT+ can significantly reduce the computation cost in dynamic data environments, and has the advantage of compacting the use of memory space.

Original languageAmerican English
Pages (from-to)103-131
Number of pages29
JournalMachine Learning
Volume83
Issue number1
DOIs
StatePublished - Apr 2011

ASJC Scopus subject areas

  • Software
  • Artificial Intelligence

Keywords

  • Checkpoint
  • Correlation coefficient
  • Pearson's correlation coefficient
  • Volatile correlation computing

Fingerprint

Dive into the research topics of 'Checkpoint evolution for volatile correlation computing'. Together they form a unique fingerprint.

Cite this