Very sparse random projections

Ping Li, Trevor J. Hastie, Kenneth W. Church

Research output: Chapter in Book/Report/Conference proceedingConference contribution

269 Citations (Scopus)

Abstract

There has been considerable interest in random projections, an approximate algorithm for estimating distances between pairs of points in a high-dimensional vector space. Let A ∈ ℝn × D be our n points in D dimensions. The method multiplies A by a random matrix R ∈ ℝ D × k, reducing the D dimensions down to just k for speeding up the computation. R typically consists of entries of standard normal N(0, 1). It is well known that random projections preserve pairwise distances (in the expectation). Achlioptas proposed sparse random projections by replacing the N(0, 1) entries in R with entries in {-1, 0, 1} with probabilities {1/6, 2/3, 1/6}, achieving a threefold speedup in processing time. We recommend using R of entries in {-1, 0, 1} with probabilities {1/2√D, 1 - 1/√D, 1/2√D} for achieving a significant √D-fold speedup, with little loss in accuracy.

Original languageEnglish (US)
Title of host publicationKDD 2006
Subtitle of host publicationProceedings of the Twelfth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Pages287-296
Number of pages10
StatePublished - Oct 16 2006
EventKDD 2006: 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - Philadelphia, PA, United States
Duration: Aug 20 2006Aug 23 2006

Publication series

NameProceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Volume2006

Other

OtherKDD 2006: 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
CountryUnited States
CityPhiladelphia, PA
Period8/20/068/23/06

Fingerprint

Random Projection
Speedup
Approximate Algorithm
Threefolds
Random Matrices
Research and Development
Vector space
Pairwise
Multiplication
Fold
High-dimensional

All Science Journal Classification (ASJC) codes

  • Software
  • Information Systems

Keywords

  • Random projections
  • Rates of convergence
  • Sampling

Cite this

Li, P., Hastie, T. J., & Church, K. W. (2006). Very sparse random projections. In KDD 2006: Proceedings of the Twelfth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 287-296). (Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; Vol. 2006).
Li, Ping ; Hastie, Trevor J. ; Church, Kenneth W. / Very sparse random projections. KDD 2006: Proceedings of the Twelfth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2006. pp. 287-296 (Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining).
@inproceedings{e90587ef2b5e403daeea27d237fe120c,
title = "Very sparse random projections",
abstract = "There has been considerable interest in random projections, an approximate algorithm for estimating distances between pairs of points in a high-dimensional vector space. Let A ∈ ℝn × D be our n points in D dimensions. The method multiplies A by a random matrix R ∈ ℝ D × k, reducing the D dimensions down to just k for speeding up the computation. R typically consists of entries of standard normal N(0, 1). It is well known that random projections preserve pairwise distances (in the expectation). Achlioptas proposed sparse random projections by replacing the N(0, 1) entries in R with entries in {-1, 0, 1} with probabilities {1/6, 2/3, 1/6}, achieving a threefold speedup in processing time. We recommend using R of entries in {-1, 0, 1} with probabilities {1/2√D, 1 - 1/√D, 1/2√D} for achieving a significant √D-fold speedup, with little loss in accuracy.",
keywords = "Random projections, Rates of convergence, Sampling",
author = "Ping Li and Hastie, {Trevor J.} and Church, {Kenneth W.}",
year = "2006",
month = "10",
day = "16",
language = "English (US)",
isbn = "1595933395",
series = "Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining",
pages = "287--296",
booktitle = "KDD 2006",

}

Li, P, Hastie, TJ & Church, KW 2006, Very sparse random projections. in KDD 2006: Proceedings of the Twelfth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, vol. 2006, pp. 287-296, KDD 2006: 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Philadelphia, PA, United States, 8/20/06.

Very sparse random projections. / Li, Ping; Hastie, Trevor J.; Church, Kenneth W.

KDD 2006: Proceedings of the Twelfth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2006. p. 287-296 (Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; Vol. 2006).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Very sparse random projections

AU - Li, Ping

AU - Hastie, Trevor J.

AU - Church, Kenneth W.

PY - 2006/10/16

Y1 - 2006/10/16

N2 - There has been considerable interest in random projections, an approximate algorithm for estimating distances between pairs of points in a high-dimensional vector space. Let A ∈ ℝn × D be our n points in D dimensions. The method multiplies A by a random matrix R ∈ ℝ D × k, reducing the D dimensions down to just k for speeding up the computation. R typically consists of entries of standard normal N(0, 1). It is well known that random projections preserve pairwise distances (in the expectation). Achlioptas proposed sparse random projections by replacing the N(0, 1) entries in R with entries in {-1, 0, 1} with probabilities {1/6, 2/3, 1/6}, achieving a threefold speedup in processing time. We recommend using R of entries in {-1, 0, 1} with probabilities {1/2√D, 1 - 1/√D, 1/2√D} for achieving a significant √D-fold speedup, with little loss in accuracy.

AB - There has been considerable interest in random projections, an approximate algorithm for estimating distances between pairs of points in a high-dimensional vector space. Let A ∈ ℝn × D be our n points in D dimensions. The method multiplies A by a random matrix R ∈ ℝ D × k, reducing the D dimensions down to just k for speeding up the computation. R typically consists of entries of standard normal N(0, 1). It is well known that random projections preserve pairwise distances (in the expectation). Achlioptas proposed sparse random projections by replacing the N(0, 1) entries in R with entries in {-1, 0, 1} with probabilities {1/6, 2/3, 1/6}, achieving a threefold speedup in processing time. We recommend using R of entries in {-1, 0, 1} with probabilities {1/2√D, 1 - 1/√D, 1/2√D} for achieving a significant √D-fold speedup, with little loss in accuracy.

KW - Random projections

KW - Rates of convergence

KW - Sampling

UR - http://www.scopus.com/inward/record.url?scp=33749573641&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33749573641&partnerID=8YFLogxK

M3 - Conference contribution

SN - 1595933395

SN - 9781595933393

T3 - Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

SP - 287

EP - 296

BT - KDD 2006

ER -

Li P, Hastie TJ, Church KW. Very sparse random projections. In KDD 2006: Proceedings of the Twelfth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2006. p. 287-296. (Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining).