Linearized gmm kernels and normalized random fourier features

Ping Li

Research output: Chapter in Book/Report/Conference proceedingConference contribution

17 Scopus citations

Abstract

The method of "random Fourier features (RFF)" has become apopu-lar tool for approximating the "radial basis function (RBF)" kernel. The variance of RFF is actually large. Interestingly, the variance can be substantially reduced by a simple normalization step as we theoretically demonstrate. We name the improved scheme as the "normalized RFF (NRFF)", and we provide a technical proof of the asymptotic variance of NRFF, as validated by simulations. We also propose the "generalized min-max (GMM)" kernel as a measure of data similarity, where data vectors can have both positive and negative entries. GMM is positive definite as there is an associate hashing method named "generalized consistent weighted sampling (GCWS)" which linearizes this (nonlinear) kernel. We provide an extensive empirical evaluation of the RBF and GMM kernels on more than 50 datasets. For a majority of the datasets, the (tuning-free) GMM kernel outperforms the best-tuned RBF kernel. We then conduct extensive classification experiments for comparing the linearized RBF kernel using NRFF with the linearized GMM kernel using GCWS. We observe that, in order to reach a similar accuracy, GCWS typically requires substantially fewer samples than NRFF, even on datasets where the original RBF kernel outperforms the original GMM kernel. As the training, storage, and processing costs are directly proportional to the sample size, our experiments can help demonstrate that GCWS would be a more practical scheme for large-scale machine learning applications. The empirical success of GCWS (compared to NRFF) can also be explained theoretically, from at least two aspects. Firstly, the relative variance (normalized by the squared expectation) of GCWS is substantially smaller than that of NRFF, except for the very high similarity region (where the variances of both methods approach zero). Secondly, if we are allowed to make a general model assumption on the data, then we can show analytically that GCWS exhibits much smaller variance than NRFF for estimating the same object (e.g., the RBF kernel), except for the very high similarity region.

Original languageAmerican English
Title of host publicationKDD 2017 - Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
PublisherAssociation for Computing Machinery
Pages315-324
Number of pages10
ISBN (Electronic)9781450348874
DOIs
StatePublished - Aug 13 2017
Event23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2017 - Halifax, Canada
Duration: Aug 13 2017Aug 17 2017

Publication series

NameProceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
VolumePart F129685

Conference

Conference23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2017
Country/TerritoryCanada
CityHalifax
Period8/13/178/17/17

ASJC Scopus subject areas

  • Software
  • Information Systems

Fingerprint

Dive into the research topics of 'Linearized gmm kernels and normalized random fourier features'. Together they form a unique fingerprint.

Cite this