Inferring the Past on Markovian Models of Networks

Project Details

Description

A major challenge in statistics and data science is the analysis of network data. Network data describe interactions and relationships between individual entities. The most prominent example is social network data, but other important examples include internet hyperlink networks, protein interaction networks, air route networks between cities, and disease transmission networks between people. These interaction networks generally start with a few individuals and, as time goes on, they attract, infect, or recruit more members and create more interactions. The goal of this project is to develop probabilistic models that accurately describe the growth process of real-world networks and to use these models to extract important information from large scale network data. Algorithms and software packages will be developed that enable users to answer questions such as, which individuals were the earliest members of a social network, or does the network contain one growing community or multiple? The results of this project will have applications in public health, social science, computer science, and national security. The project also provides research training opportunities for graduate students. The framework developed by the PI models a random network as a combination of a preferential attachment (PA) tree and Erdos-Renyi (ER) random edges. The PA tree describes the growth process of a network and may be regarded as the signal and the ER random edges can be interpreted as the noise. This framework includes many existing network models as special cases and allows practitioners to trade-off model complexity and computational complexity. Scalable methodology based on Gibbs sampling will be developed to tackle inference problems such as constructing confidence sets for the root nodes or inferring the community membership of the nodes of a network. Theoretical analysis, based on existing probabilistic properties of preferential attachment models, will also be conducted to assess the quality of statistical inference as a function of the signal-to-noise ratio and to understand the information limits of these problems.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
StatusActive
Effective start/end date7/1/216/30/25

Funding

  • National Science Foundation: $199,999.00

Fingerprint

Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.