Learning from data with heterogeneous noise using SGD

Shuang Song, Kamalika Chaudhuri, Anand D. Sarwate

Research output: Contribution to journalConference articlepeer-review

9 Scopus citations

Abstract

We consider learning from data of variable quality that may be obtained from diffierent heterogeneous sources. Addressing learning from heterogeneous data in its full generality is a challenging problem. In this paper, we adopt instead a model in which data is observed through heterogeneous noise, where the noise level reects the quality of the data source. We study how to use stochastic gradient algorithms to learn in this model. Our study is motivated by two concrete examples where this problem arises naturally: learning with local diffierential privacy based on data from multiple sources with diffierent privacy requirements, and learning from data with labels of variable quality. The main contribution of this paper is to identify how heterogeneous noise impacts performance. We show that given two datasets with heterogeneous noise, the order in which to use them in standard SGD depends on the learning rate. We propose a method for changing the learning rate as a function of the heterogeneity, and prove new regret bounds for our method in two cases of interest. Finally, we evaluate the performance of our algorithm on real data.

Original languageEnglish (US)
Pages (from-to)894-902
Number of pages9
JournalJournal of Machine Learning Research
Volume38
StatePublished - 2015
Event18th International Conference on Artificial Intelligence and Statistics, AISTATS 2015 - San Diego, United States
Duration: May 9 2015May 12 2015

All Science Journal Classification (ASJC) codes

  • Software
  • Control and Systems Engineering
  • Statistics and Probability
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Learning from data with heterogeneous noise using SGD'. Together they form a unique fingerprint.

Cite this