Abstract
Community identification in a network is an important problem in fields such as social science, neuroscience and genetics. Over the past decade, stochastic block models (SBMs) have emerged as a popular statistical framework for this problem. However, SBMs have an important limitation in that they are suited only for networks with unweighted edges; in various scientific applications, disregarding the edge weights may result in a loss of valuable information. We study a weighted generalization of the SBM, in which observations are collected in the form of a weighted adjacency matrix and the weight of each edge is generated independently from an unknown probability density determined by the community membership of its endpoints. We characterize the optimal rate of misclustering error of the weighted SBM in terms of the Renyi divergence of order 1/2 between the weight distributions of within-community and between-community edges, substantially generalizing existing results for unweighted SBMs. Furthermore, we present a computationally tractable algorithm based on discretization that achieves the optimal error rate. Our method is adaptive in the sense that the algorithm, without assuming knowledge of the weight densities, performs as well as the best algorithm that knows the weight densities.
Original language | American English |
---|---|
Pages (from-to) | 183-204 |
Number of pages | 22 |
Journal | Annals of Statistics |
Volume | 48 |
Issue number | 1 |
DOIs | |
State | Published - 2020 |
ASJC Scopus subject areas
- Statistics and Probability
- Statistics, Probability and Uncertainty
Keywords
- Network analysis
- Nonparametric estimation
- Optimal estimation rates
- Renyi divergence
- Stochastic block models