Conditional random sampling: A sketch-based sampling technique for sparse data

Ping Li, Kenneth W. Church, Trevor J. Hastie

Research output: Chapter in Book/Report/Conference proceedingConference contribution

32 Citations (Scopus)

Abstract

We 1 develop Conditional Random Sampling (CRS), a technique particularly suitable for sparse data. In large-scale applications, the data are often highly sparse. CRS combines sketching and sampling in that it converts sketches of the data into conditional random samples online in the estimation stage, with the sample size determined retrospectively. This paper focuses on approximating pairwise l 2 and l 1 distances and comparing CRS with random projections. For boolean (0/1) data, CRS is provably better than random projections. We show using real-world data that CRS often outperforms random projections. This technique can be applied in learning, data mining, information retrieval, and database query optimizations.

Original languageEnglish (US)
Title of host publicationAdvances in Neural Information Processing Systems 19 - Proceedings of the 2006 Conference
Pages873-880
Number of pages8
StatePublished - Dec 1 2007
Event20th Annual Conference on Neural Information Processing Systems, NIPS 2006 - Vancouver, BC, Canada
Duration: Dec 4 2006Dec 7 2006

Publication series

NameAdvances in Neural Information Processing Systems

Other

Other20th Annual Conference on Neural Information Processing Systems, NIPS 2006
CountryCanada
CityVancouver, BC
Period12/4/0612/7/06

Fingerprint

Sampling
Information retrieval
Data mining

All Science Journal Classification (ASJC) codes

  • Information Systems
  • Signal Processing
  • Computer Networks and Communications

Cite this

Li, P., Church, K. W., & Hastie, T. J. (2007). Conditional random sampling: A sketch-based sampling technique for sparse data. In Advances in Neural Information Processing Systems 19 - Proceedings of the 2006 Conference (pp. 873-880). (Advances in Neural Information Processing Systems).
Li, Ping ; Church, Kenneth W. ; Hastie, Trevor J. / Conditional random sampling : A sketch-based sampling technique for sparse data. Advances in Neural Information Processing Systems 19 - Proceedings of the 2006 Conference. 2007. pp. 873-880 (Advances in Neural Information Processing Systems).
@inproceedings{ec0aaa9bd4ae414fbb8942af609bf1f4,
title = "Conditional random sampling: A sketch-based sampling technique for sparse data",
abstract = "We 1 develop Conditional Random Sampling (CRS), a technique particularly suitable for sparse data. In large-scale applications, the data are often highly sparse. CRS combines sketching and sampling in that it converts sketches of the data into conditional random samples online in the estimation stage, with the sample size determined retrospectively. This paper focuses on approximating pairwise l 2 and l 1 distances and comparing CRS with random projections. For boolean (0/1) data, CRS is provably better than random projections. We show using real-world data that CRS often outperforms random projections. This technique can be applied in learning, data mining, information retrieval, and database query optimizations.",
author = "Ping Li and Church, {Kenneth W.} and Hastie, {Trevor J.}",
year = "2007",
month = "12",
day = "1",
language = "English (US)",
isbn = "9780262195683",
series = "Advances in Neural Information Processing Systems",
pages = "873--880",
booktitle = "Advances in Neural Information Processing Systems 19 - Proceedings of the 2006 Conference",

}

Li, P, Church, KW & Hastie, TJ 2007, Conditional random sampling: A sketch-based sampling technique for sparse data. in Advances in Neural Information Processing Systems 19 - Proceedings of the 2006 Conference. Advances in Neural Information Processing Systems, pp. 873-880, 20th Annual Conference on Neural Information Processing Systems, NIPS 2006, Vancouver, BC, Canada, 12/4/06.

Conditional random sampling : A sketch-based sampling technique for sparse data. / Li, Ping; Church, Kenneth W.; Hastie, Trevor J.

Advances in Neural Information Processing Systems 19 - Proceedings of the 2006 Conference. 2007. p. 873-880 (Advances in Neural Information Processing Systems).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Conditional random sampling

T2 - A sketch-based sampling technique for sparse data

AU - Li, Ping

AU - Church, Kenneth W.

AU - Hastie, Trevor J.

PY - 2007/12/1

Y1 - 2007/12/1

N2 - We 1 develop Conditional Random Sampling (CRS), a technique particularly suitable for sparse data. In large-scale applications, the data are often highly sparse. CRS combines sketching and sampling in that it converts sketches of the data into conditional random samples online in the estimation stage, with the sample size determined retrospectively. This paper focuses on approximating pairwise l 2 and l 1 distances and comparing CRS with random projections. For boolean (0/1) data, CRS is provably better than random projections. We show using real-world data that CRS often outperforms random projections. This technique can be applied in learning, data mining, information retrieval, and database query optimizations.

AB - We 1 develop Conditional Random Sampling (CRS), a technique particularly suitable for sparse data. In large-scale applications, the data are often highly sparse. CRS combines sketching and sampling in that it converts sketches of the data into conditional random samples online in the estimation stage, with the sample size determined retrospectively. This paper focuses on approximating pairwise l 2 and l 1 distances and comparing CRS with random projections. For boolean (0/1) data, CRS is provably better than random projections. We show using real-world data that CRS often outperforms random projections. This technique can be applied in learning, data mining, information retrieval, and database query optimizations.

UR - http://www.scopus.com/inward/record.url?scp=84864064770&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84864064770&partnerID=8YFLogxK

M3 - Conference contribution

SN - 9780262195683

T3 - Advances in Neural Information Processing Systems

SP - 873

EP - 880

BT - Advances in Neural Information Processing Systems 19 - Proceedings of the 2006 Conference

ER -

Li P, Church KW, Hastie TJ. Conditional random sampling: A sketch-based sampling technique for sparse data. In Advances in Neural Information Processing Systems 19 - Proceedings of the 2006 Conference. 2007. p. 873-880. (Advances in Neural Information Processing Systems).