Abstract
If there are datasets, too large to fit into a single computer or too expensive for a computationally intensive data analysis, what should we do? We propose a split-and-conquer approach and illustrate it using several computationally intensive penalized regression methods, along with a theoretical support. We show that the split-and-conquer approach can substantially reduce computing time and computer memory requirements. The proposed methodology is illustrated numerically using both simulation and data examples.
| Original language | American English |
|---|---|
| Pages (from-to) | 1655-1684 |
| Number of pages | 30 |
| Journal | Statistica Sinica |
| Volume | 24 |
| Issue number | 4 |
| DOIs | |
| State | Published - Oct 2014 |
ASJC Scopus subject areas
- Statistics and Probability
- Statistics, Probability and Uncertainty
Keywords
- Big data
- Combining results from independent analyses
- Distributed computing
- Generalized linear models
- Large sample theory
- Penalized regression
Fingerprint
Dive into the research topics of 'A split-and-conquer approach for analysis of extraordinarily large data'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver