Stability yields a PTAS for k-median and k-means clustering

Pranjal Awasthi, Avrim Blum, Or Sheffet

Research output: Chapter in Book/Report/Conference proceedingConference contribution

43 Scopus citations

Abstract

We consider k-median clustering in finite metric spaces and k-means clustering in Euclidean spaces, in the setting where k is part of the input (not a constant). For the k-means problem, Ostrovsky et al. [18] show that if the optimal (k-1)-means clustering of the input is more expensive than the optimal k-means clustering by a factor of 1/∈2, then one can achieve a (1 + f(∈))-approximation to the k-means optimal in time polynomial in n and k by using a variant of Lloyd's algorithm. In this work we substantially improve this approximation guarantee. We show that given only the condition that the (k-1)-means optimal is more expensive than the k-means optimal by a factor 1+α for some constant α > 0, we can obtain a PTAS. In particular, under this assumption, for any ∈ > 0 we achieve a (1 + ∈)-approximation to the k-means optimal in time polynomial in n and k, and exponential in 1/∈ and 1/α. We thus decouple the strength of the assumption from the quality of the approximation ratio. We also give a PTAS for the k-median problem in finite metrics under the analogous assumption as well. For k-means, we in addition give a randomized algorithm with improved running time of nO(1)(k log n)poly(1/∈,1/α). Our technique also obtains a PTAS under the assumption of Balcan et al. [4] that all (1+α) approximations are δ-close to a desired target clustering, in the case that all target clusters have size greater than δn and α > 0 is constant. Note that the motivation of Balcan et al. [4] is that for many clustering problems, the objective function is only a proxy for the true goal of getting close to the target. From this perspective, our improvement is that for k-means in Euclidean spaces we reduce the distance of the clustering found to the target from O(δ) to δ when all target clusters are large, and for k-median we improve the "largeness" condition needed in [4] to get exactly δ-close from O(δn) to δn. Our results are based on a new notion of clustering stability.

Original languageEnglish (US)
Title of host publicationProceedings - 2010 IEEE 51st Annual Symposium on Foundations of Computer Science, FOCS 2010
Pages309-318
Number of pages10
DOIs
StatePublished - 2010
Event2010 IEEE 51st Annual Symposium on Foundations of Computer Science, FOCS 2010 - Las Vegas, NV, United States
Duration: Oct 23 2010Oct 26 2010

Publication series

NameProceedings - Annual IEEE Symposium on Foundations of Computer Science, FOCS

Other

Other2010 IEEE 51st Annual Symposium on Foundations of Computer Science, FOCS 2010
CountryUnited States
CityLas Vegas, NV
Period10/23/1010/26/10

All Science Journal Classification (ASJC) codes

  • Computer Science(all)

Cite this

Awasthi, P., Blum, A., & Sheffet, O. (2010). Stability yields a PTAS for k-median and k-means clustering. In Proceedings - 2010 IEEE 51st Annual Symposium on Foundations of Computer Science, FOCS 2010 (pp. 309-318). [5671196] (Proceedings - Annual IEEE Symposium on Foundations of Computer Science, FOCS). https://doi.org/10.1109/FOCS.2010.36