Bayesian hierarchical cross-clustering

Dazhuo Li, Patrick Shafto

Research output: Contribution to journalConference articlepeer-review

4 Scopus citations

Abstract

Most clustering algorithms assume that all dimensions of the data can be described by a single structure. Cross-clustering (or multiview clustering) allows multiple structures, each applying to a subset of the dimensions. We present a novel approach to cross-clustering, based on approximating the solution to a Cross Dirichlet Process mixture (CDPM) model [Shafto et al., 2006, Mansinghka et al., 2009]. Our bottom-up, deterministic approach results in a hierarchical clustering of dimensions, and at each node, a hierarchical clustering of data points. We also present a randomized approximation, based on a truncated hierarchy, that scales linearly in the number of levels. Results on synthetic and real-world data sets demonstrate that the cross-clustering based algorithms perform as well or better than the clustering based algorithms, our deterministic approaches models perform as well as the MCMC-based CDPM, and the randomized approximation provides a remarkable speedup relative to the full deterministic approximation with minimal cost in predictive error.

Original languageEnglish (US)
Pages (from-to)443-451
Number of pages9
JournalJournal of Machine Learning Research
Volume15
StatePublished - 2011
Externally publishedYes
Event14th International Conference on Artificial Intelligence and Statistics, AISTATS 2011 - Fort Lauderdale, FL, United States
Duration: Apr 11 2011Apr 13 2011

All Science Journal Classification (ASJC) codes

  • Software
  • Artificial Intelligence
  • Control and Systems Engineering
  • Statistics and Probability

Fingerprint

Dive into the research topics of 'Bayesian hierarchical cross-clustering'. Together they form a unique fingerprint.

Cite this