Skip to main navigation Skip to search Skip to main content

A split-and-conquer approach for analysis of extraordinarily large data

Research output: Contribution to journalArticlepeer-review

Abstract

If there are datasets, too large to fit into a single computer or too expensive for a computationally intensive data analysis, what should we do? We propose a split-and-conquer approach and illustrate it using several computationally intensive penalized regression methods, along with a theoretical support. We show that the split-and-conquer approach can substantially reduce computing time and computer memory requirements. The proposed methodology is illustrated numerically using both simulation and data examples.

Original languageAmerican English
Pages (from-to)1655-1684
Number of pages30
JournalStatistica Sinica
Volume24
Issue number4
DOIs
StatePublished - Oct 2014

ASJC Scopus subject areas

  • Statistics and Probability
  • Statistics, Probability and Uncertainty

Keywords

  • Big data
  • Combining results from independent analyses
  • Distributed computing
  • Generalized linear models
  • Large sample theory
  • Penalized regression

Fingerprint

Dive into the research topics of 'A split-and-conquer approach for analysis of extraordinarily large data'. Together they form a unique fingerprint.

Cite this