A PROBABILISTIC â.," 1METHOD for CLUSTERING HIGH-DIMENSIONAL DATA

Tsvetan Asamov, Adi Ben-Israel

Research output: Contribution to journalArticlepeer-review

Abstract

In general, the clustering problem is NP-hard, and global optimality cannot be established for non-trivial instances. For high-dimensional data, distance-based methods for clustering or classification face an additional difficulty, the unreliability of distances in very high-dimensional spaces. We propose a probabilistic, distance-based, iterative method for clustering data in very high-dimensional space, using the â.,"1-metric that is less sensitive to high dimensionality than the Euclidean distance. For K clusters in â.,?n, the problem decomposes to K problems coupled by probabilities, and an iteration reduces to finding Kn weighted medians of points on a line. The complexity of the algorithm is linear in the dimension of the data space, and its performance was observed to improve significantly as the dimension increases.

Original languageEnglish (US)
JournalProbability in the Engineering and Informational Sciences
DOIs
StateAccepted/In press - 2021
Externally publishedYes

ASJC Scopus subject areas

  • Statistics and Probability
  • Statistics, Probability and Uncertainty
  • Management Science and Operations Research
  • Industrial and Manufacturing Engineering

Keywords

  • clustering
  • continuous location
  • high-dimensional data
  • â.,"-norm

Fingerprint

Dive into the research topics of 'A PROBABILISTIC â.," 1METHOD for CLUSTERING HIGH-DIMENSIONAL DATA'. Together they form a unique fingerprint.

Cite this