Clustering is semidefinitely not that hard

Nonnegative SDP for manifold disentangling

Mariano Tepper, Anirvan Sengupta, Dmitri Chklovskii

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

In solving hard computational problems, semidefinite program (SDP) relaxations often play an important role because they come with a guarantee of optimality. Here, we focus on a popular semidefinite relaxation of K-means clustering which yields the same solution as the non-convex original formulation for well segregated datasets. We report an unexpected finding: when data contains (greater than zero-dimensional) manifolds, the SDP solution captures such geometrical structures. Unlike traditional manifold embedding techniques, our approach does not rely on manually defining a kernel but rather enforces locality via a nonnegativity constraint. We thus call our approach NOnnegative MAnifold Disentangling, or NOMAD. To build an intuitive understanding of its manifold learning capabilities, we develop a theoretical analysis of NOMAD on idealized datasets. While NOMAD is convex and the globally optimal solution can be found by generic SDP solvers with polynomial time complexity, they are too slow for modern datasets. To address this problem, we analyze a non-convex heuristic and present a new, convex and yet efficient, algorithm, based on the conditional gradient method. Our results render NOMAD a versatile, understandable, and powerful tool for manifold learning.

Original languageEnglish (US)
JournalJournal of Machine Learning Research
Volume19
StatePublished - Nov 1 2018

Fingerprint

Semidefinite Program
Manifold Learning
Non-negative
Clustering
Gradient methods
Polynomial-time Complexity
Semidefinite Relaxation
Zero-dimensional
K-means Clustering
Nonnegativity
Gradient Method
Polynomials
Locality
Intuitive
Optimality
Theoretical Analysis
Efficient Algorithms
Optimal Solution
Heuristics
kernel

All Science Journal Classification (ASJC) codes

  • Software
  • Artificial Intelligence
  • Control and Systems Engineering
  • Statistics and Probability

Keywords

  • Conditional gradient method
  • K-means
  • Manifolds
  • Semidefinite programming

Cite this

@article{952275282e6e49918830380a89d6767b,
title = "Clustering is semidefinitely not that hard: Nonnegative SDP for manifold disentangling",
abstract = "In solving hard computational problems, semidefinite program (SDP) relaxations often play an important role because they come with a guarantee of optimality. Here, we focus on a popular semidefinite relaxation of K-means clustering which yields the same solution as the non-convex original formulation for well segregated datasets. We report an unexpected finding: when data contains (greater than zero-dimensional) manifolds, the SDP solution captures such geometrical structures. Unlike traditional manifold embedding techniques, our approach does not rely on manually defining a kernel but rather enforces locality via a nonnegativity constraint. We thus call our approach NOnnegative MAnifold Disentangling, or NOMAD. To build an intuitive understanding of its manifold learning capabilities, we develop a theoretical analysis of NOMAD on idealized datasets. While NOMAD is convex and the globally optimal solution can be found by generic SDP solvers with polynomial time complexity, they are too slow for modern datasets. To address this problem, we analyze a non-convex heuristic and present a new, convex and yet efficient, algorithm, based on the conditional gradient method. Our results render NOMAD a versatile, understandable, and powerful tool for manifold learning.",
keywords = "Conditional gradient method, K-means, Manifolds, Semidefinite programming",
author = "Mariano Tepper and Anirvan Sengupta and Dmitri Chklovskii",
year = "2018",
month = "11",
day = "1",
language = "English (US)",
volume = "19",
journal = "Journal of Machine Learning Research",
issn = "1532-4435",
publisher = "Microtome Publishing",

}

Clustering is semidefinitely not that hard : Nonnegative SDP for manifold disentangling. / Tepper, Mariano; Sengupta, Anirvan; Chklovskii, Dmitri.

In: Journal of Machine Learning Research, Vol. 19, 01.11.2018.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Clustering is semidefinitely not that hard

T2 - Nonnegative SDP for manifold disentangling

AU - Tepper, Mariano

AU - Sengupta, Anirvan

AU - Chklovskii, Dmitri

PY - 2018/11/1

Y1 - 2018/11/1

N2 - In solving hard computational problems, semidefinite program (SDP) relaxations often play an important role because they come with a guarantee of optimality. Here, we focus on a popular semidefinite relaxation of K-means clustering which yields the same solution as the non-convex original formulation for well segregated datasets. We report an unexpected finding: when data contains (greater than zero-dimensional) manifolds, the SDP solution captures such geometrical structures. Unlike traditional manifold embedding techniques, our approach does not rely on manually defining a kernel but rather enforces locality via a nonnegativity constraint. We thus call our approach NOnnegative MAnifold Disentangling, or NOMAD. To build an intuitive understanding of its manifold learning capabilities, we develop a theoretical analysis of NOMAD on idealized datasets. While NOMAD is convex and the globally optimal solution can be found by generic SDP solvers with polynomial time complexity, they are too slow for modern datasets. To address this problem, we analyze a non-convex heuristic and present a new, convex and yet efficient, algorithm, based on the conditional gradient method. Our results render NOMAD a versatile, understandable, and powerful tool for manifold learning.

AB - In solving hard computational problems, semidefinite program (SDP) relaxations often play an important role because they come with a guarantee of optimality. Here, we focus on a popular semidefinite relaxation of K-means clustering which yields the same solution as the non-convex original formulation for well segregated datasets. We report an unexpected finding: when data contains (greater than zero-dimensional) manifolds, the SDP solution captures such geometrical structures. Unlike traditional manifold embedding techniques, our approach does not rely on manually defining a kernel but rather enforces locality via a nonnegativity constraint. We thus call our approach NOnnegative MAnifold Disentangling, or NOMAD. To build an intuitive understanding of its manifold learning capabilities, we develop a theoretical analysis of NOMAD on idealized datasets. While NOMAD is convex and the globally optimal solution can be found by generic SDP solvers with polynomial time complexity, they are too slow for modern datasets. To address this problem, we analyze a non-convex heuristic and present a new, convex and yet efficient, algorithm, based on the conditional gradient method. Our results render NOMAD a versatile, understandable, and powerful tool for manifold learning.

KW - Conditional gradient method

KW - K-means

KW - Manifolds

KW - Semidefinite programming

UR - http://www.scopus.com/inward/record.url?scp=85060533754&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85060533754&partnerID=8YFLogxK

M3 - Article

VL - 19

JO - Journal of Machine Learning Research

JF - Journal of Machine Learning Research

SN - 1532-4435

ER -