Abstract
There has been considerable interest in random projections, an approximate algorithm for estimating distances between pairs of points in a high-dimensional vector space. Let A ∈ ℝn × D be our n points in D dimensions. The method multiplies A by a random matrix R ∈ ℝ D × k, reducing the D dimensions down to just k for speeding up the computation. R typically consists of entries of standard normal N(0, 1). It is well known that random projections preserve pairwise distances (in the expectation). Achlioptas proposed sparse random projections by replacing the N(0, 1) entries in R with entries in {-1, 0, 1} with probabilities {1/6, 2/3, 1/6}, achieving a threefold speedup in processing time. We recommend using R of entries in {-1, 0, 1} with probabilities {1/2√D, 1 - 1/√D, 1/2√D} for achieving a significant √D-fold speedup, with little loss in accuracy.
Original language | English (US) |
---|---|
Title of host publication | KDD 2006 |
Subtitle of host publication | Proceedings of the Twelfth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining |
Pages | 287-296 |
Number of pages | 10 |
State | Published - Oct 16 2006 |
Event | KDD 2006: 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - Philadelphia, PA, United States Duration: Aug 20 2006 → Aug 23 2006 |
Publication series
Name | Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining |
---|---|
Volume | 2006 |
Other
Other | KDD 2006: 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining |
---|---|
Country | United States |
City | Philadelphia, PA |
Period | 8/20/06 → 8/23/06 |
Fingerprint
All Science Journal Classification (ASJC) codes
- Software
- Information Systems
Keywords
- Random projections
- Rates of convergence
- Sampling
Cite this
}
Very sparse random projections. / Li, Ping; Hastie, Trevor J.; Church, Kenneth W.
KDD 2006: Proceedings of the Twelfth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2006. p. 287-296 (Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; Vol. 2006).Research output: Chapter in Book/Report/Conference proceeding › Conference contribution
TY - GEN
T1 - Very sparse random projections
AU - Li, Ping
AU - Hastie, Trevor J.
AU - Church, Kenneth W.
PY - 2006/10/16
Y1 - 2006/10/16
N2 - There has been considerable interest in random projections, an approximate algorithm for estimating distances between pairs of points in a high-dimensional vector space. Let A ∈ ℝn × D be our n points in D dimensions. The method multiplies A by a random matrix R ∈ ℝ D × k, reducing the D dimensions down to just k for speeding up the computation. R typically consists of entries of standard normal N(0, 1). It is well known that random projections preserve pairwise distances (in the expectation). Achlioptas proposed sparse random projections by replacing the N(0, 1) entries in R with entries in {-1, 0, 1} with probabilities {1/6, 2/3, 1/6}, achieving a threefold speedup in processing time. We recommend using R of entries in {-1, 0, 1} with probabilities {1/2√D, 1 - 1/√D, 1/2√D} for achieving a significant √D-fold speedup, with little loss in accuracy.
AB - There has been considerable interest in random projections, an approximate algorithm for estimating distances between pairs of points in a high-dimensional vector space. Let A ∈ ℝn × D be our n points in D dimensions. The method multiplies A by a random matrix R ∈ ℝ D × k, reducing the D dimensions down to just k for speeding up the computation. R typically consists of entries of standard normal N(0, 1). It is well known that random projections preserve pairwise distances (in the expectation). Achlioptas proposed sparse random projections by replacing the N(0, 1) entries in R with entries in {-1, 0, 1} with probabilities {1/6, 2/3, 1/6}, achieving a threefold speedup in processing time. We recommend using R of entries in {-1, 0, 1} with probabilities {1/2√D, 1 - 1/√D, 1/2√D} for achieving a significant √D-fold speedup, with little loss in accuracy.
KW - Random projections
KW - Rates of convergence
KW - Sampling
UR - http://www.scopus.com/inward/record.url?scp=33749573641&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=33749573641&partnerID=8YFLogxK
M3 - Conference contribution
SN - 1595933395
SN - 9781595933393
T3 - Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
SP - 287
EP - 296
BT - KDD 2006
ER -