Optimal rates for community estimation in the weighted stochastic block model

Min Xu, Varun Jog, Po Ling Loh

Research output: Contribution to journalArticlepeer-review

Abstract

Community identification in a network is an important problem in fields such as social science, neuroscience and genetics. Over the past decade, stochastic block models (SBMs) have emerged as a popular statistical framework for this problem. However, SBMs have an important limitation in that they are suited only for networks with unweighted edges; in various scientific applications, disregarding the edge weights may result in a loss of valuable information. We study a weighted generalization of the SBM, in which observations are collected in the form of a weighted adjacency matrix and the weight of each edge is generated independently from an unknown probability density determined by the community membership of its endpoints. We characterize the optimal rate of misclustering error of the weighted SBM in terms of the Renyi divergence of order 1/2 between the weight distributions of within-community and between-community edges, substantially generalizing existing results for unweighted SBMs. Furthermore, we present a computationally tractable algorithm based on discretization that achieves the optimal error rate. Our method is adaptive in the sense that the algorithm, without assuming knowledge of the weight densities, performs as well as the best algorithm that knows the weight densities.

Original languageAmerican English
Pages (from-to)183-204
Number of pages22
JournalAnnals of Statistics
Volume48
Issue number1
DOIs
StatePublished - 2020

ASJC Scopus subject areas

  • Statistics and Probability
  • Statistics, Probability and Uncertainty

Keywords

  • Network analysis
  • Nonparametric estimation
  • Optimal estimation rates
  • Renyi divergence
  • Stochastic block models

Fingerprint

Dive into the research topics of 'Optimal rates for community estimation in the weighted stochastic block model'. Together they form a unique fingerprint.

Cite this