Unravel machine learning blackboxes -- A general, effective and performance-guaranteed statistical framework for complex and irregular inference problems in data science

Project Details

Description

Despite tremendous growth in data analytics in the recent years, data scientists continue to confront a diverse set of emerging challenges, including the “black-box” problems where machine learning methods might be empirically effective but tend to lack interpretability. These “black-box” problems make it difficult to interpret the machine learning results and undermine the trust of the artificial intelligent outcomes, especially in health care domains. This research project will advance the foundations of interpretable data sciences and will develop new solutions for complex irregular statistical inference problems. The project will expand applications of statistics and uncertainty quantification in machine learning and data sciences. The research results will be integrated into course curricula to train the next generation of statisticians and data scientists. This project will pay a particular attention to advancing broader participation in statistical sciences at all educational levels and the research findings will be disseminated in various interdisciplinary venues to bolster knowledge synthesis among different domains. In more detail, this project will provide novel mathematical and computational developments to tackle irregular inference problems and will unravel the black-boxes in several machine learning models in terms of their interpretability. Here, “irregular inference problems” refers to highly complex problems for which the existing regular statistical inference conditions and large sample theories do not apply. The research agenda will focus specifically on providing valid and performance-guaranteed statistical inference for problems concerning discrete (numerical) or non-numerical parameters, and problems involving non-traditional data (e.g., non-numerical data), tailoring the study to the three popular machine learning models: random forests, deep neural networks, and graphical networks. The research initiative includes four subprojects: (i) uncertainty quantification of the tree learning methods and random forests; (ii) performance-guaranteed architecture discovery of deep neural network models; (iii) statistical inference for generative graphical networks; and (iv) theoretical developments for solving irregular inference problems. The results of the projects are expected to improve interpretability of a broader class of machine learning tools.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
StatusActive
Effective start/end date7/1/236/30/26

Fingerprint

Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.