Abstract
We 1 develop Conditional Random Sampling (CRS), a technique particularly suitable for sparse data. In large-scale applications, the data are often highly sparse. CRS combines sketching and sampling in that it converts sketches of the data into conditional random samples online in the estimation stage, with the sample size determined retrospectively. This paper focuses on approximating pairwise l 2 and l 1 distances and comparing CRS with random projections. For boolean (0/1) data, CRS is provably better than random projections. We show using real-world data that CRS often outperforms random projections. This technique can be applied in learning, data mining, information retrieval, and database query optimizations.
Original language | English (US) |
---|---|
Title of host publication | Advances in Neural Information Processing Systems 19 - Proceedings of the 2006 Conference |
Pages | 873-880 |
Number of pages | 8 |
State | Published - Dec 1 2007 |
Event | 20th Annual Conference on Neural Information Processing Systems, NIPS 2006 - Vancouver, BC, Canada Duration: Dec 4 2006 → Dec 7 2006 |
Publication series
Name | Advances in Neural Information Processing Systems |
---|
Other
Other | 20th Annual Conference on Neural Information Processing Systems, NIPS 2006 |
---|---|
Country | Canada |
City | Vancouver, BC |
Period | 12/4/06 → 12/7/06 |
Fingerprint
All Science Journal Classification (ASJC) codes
- Information Systems
- Signal Processing
- Computer Networks and Communications
Cite this
}
Conditional random sampling : A sketch-based sampling technique for sparse data. / Li, Ping; Church, Kenneth W.; Hastie, Trevor J.
Advances in Neural Information Processing Systems 19 - Proceedings of the 2006 Conference. 2007. p. 873-880 (Advances in Neural Information Processing Systems).Research output: Chapter in Book/Report/Conference proceeding › Conference contribution
TY - GEN
T1 - Conditional random sampling
T2 - A sketch-based sampling technique for sparse data
AU - Li, Ping
AU - Church, Kenneth W.
AU - Hastie, Trevor J.
PY - 2007/12/1
Y1 - 2007/12/1
N2 - We 1 develop Conditional Random Sampling (CRS), a technique particularly suitable for sparse data. In large-scale applications, the data are often highly sparse. CRS combines sketching and sampling in that it converts sketches of the data into conditional random samples online in the estimation stage, with the sample size determined retrospectively. This paper focuses on approximating pairwise l 2 and l 1 distances and comparing CRS with random projections. For boolean (0/1) data, CRS is provably better than random projections. We show using real-world data that CRS often outperforms random projections. This technique can be applied in learning, data mining, information retrieval, and database query optimizations.
AB - We 1 develop Conditional Random Sampling (CRS), a technique particularly suitable for sparse data. In large-scale applications, the data are often highly sparse. CRS combines sketching and sampling in that it converts sketches of the data into conditional random samples online in the estimation stage, with the sample size determined retrospectively. This paper focuses on approximating pairwise l 2 and l 1 distances and comparing CRS with random projections. For boolean (0/1) data, CRS is provably better than random projections. We show using real-world data that CRS often outperforms random projections. This technique can be applied in learning, data mining, information retrieval, and database query optimizations.
UR - http://www.scopus.com/inward/record.url?scp=84864064770&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84864064770&partnerID=8YFLogxK
M3 - Conference contribution
SN - 9780262195683
T3 - Advances in Neural Information Processing Systems
SP - 873
EP - 880
BT - Advances in Neural Information Processing Systems 19 - Proceedings of the 2006 Conference
ER -