TY - JOUR
T1 - Testing the manifold hypothesis
AU - Fefferman, Charles
AU - Mitter, Sanjoy
AU - Narayanan, Hariharan
N1 - Funding Information: The first author was supported by NSF grant DMS 1265524, AFOSR grant FA9550-12-1-0425 and U.S.-Israel Binational Science Foundation grant 2014055. The second author was supported by NSF grant EECS-1135843. Publisher Copyright: © 2016 American Mathmatical Society.
PY - 2016/10
Y1 - 2016/10
N2 - The hypothesis that high dimensional data tend to lie in the vicinity of a low dimensional manifold is the basis of manifold learning. The goal of this paper is to develop an algorithm (with accompanying complexity guarantees) for testing the existence of a manifold that fits a probability distribution supported in a separable Hilbert space, only using i.i.d. samples from that distribution. More precisely, our setting is the following. Suppose that data are drawn independently at random from a probability distribution P supported on the unit ball of a separable Hilbert space H. Let G(d.V.T)be the set of submanifolds of the unit ball of H whose volume is at most Vand reach (which is the supremum of all rsuch that any point at a distance less than rhas a unique nearest point on the manifold) is at least T. Let L(M.P) denote the mean-squared distance of a random point from the probability distribution Pto M. We obtain an algorithm that tests the manifold hypothesis in the following sense. The algorithm takes i.i.d. random samples from Pas input and determines which of the following two is true (at least one must be): There exists M ∈ G(d,CV,T/C) such that L(M,P)≤C∈ There exists no M ∈ G(d,V/C,CT) such that L(M,P) ≤ ∈/C The answer is correct with probability at least 1-δ.
AB - The hypothesis that high dimensional data tend to lie in the vicinity of a low dimensional manifold is the basis of manifold learning. The goal of this paper is to develop an algorithm (with accompanying complexity guarantees) for testing the existence of a manifold that fits a probability distribution supported in a separable Hilbert space, only using i.i.d. samples from that distribution. More precisely, our setting is the following. Suppose that data are drawn independently at random from a probability distribution P supported on the unit ball of a separable Hilbert space H. Let G(d.V.T)be the set of submanifolds of the unit ball of H whose volume is at most Vand reach (which is the supremum of all rsuch that any point at a distance less than rhas a unique nearest point on the manifold) is at least T. Let L(M.P) denote the mean-squared distance of a random point from the probability distribution Pto M. We obtain an algorithm that tests the manifold hypothesis in the following sense. The algorithm takes i.i.d. random samples from Pas input and determines which of the following two is true (at least one must be): There exists M ∈ G(d,CV,T/C) such that L(M,P)≤C∈ There exists no M ∈ G(d,V/C,CT) such that L(M,P) ≤ ∈/C The answer is correct with probability at least 1-δ.
UR - http://www.scopus.com/inward/record.url?scp=84979774604&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84979774604&partnerID=8YFLogxK
U2 - 10.1090/jams/852
DO - 10.1090/jams/852
M3 - Article
SN - 0894-0347
VL - 29
SP - 983
EP - 1049
JO - Journal of the American Mathematical Society
JF - Journal of the American Mathematical Society
IS - 4
ER -