The proposed research will develop new methodologies and algorithms for statistical inference of low-dimensional parameters with high-dimensional data. A low-dimensional projection estimator will be further developed in linear regression and extended to more general high-dimensional statistical models, including generalized linear models, the proportional hazards model, large matrix models and more. The project will investigate consistency and asymptotic normality of the proposed estimators, test of significance, confidence intervals and regions, their efficiency in terms minimum Fisher information, and their tolerance to multiplicity adjustments. This research will directly connect the fields of semi-parametric methods and high-dimensional data, producing a locally uniform and efficient framework of statistical inference. High-dimensional data is an area of intense current interest in statistical research and practice due to the rapid development of information technologies and their applications to modern scientific experiments. Important fields with an abundance of high-dimensional data include bioinformatics, signal processing, neural imaging, communications networks and more. In many such scientific and engineering applications, the number of unknowns, and thus the complexity of the problem, is a function of the number of features: genetic components in bioinformatics, brain regions or voxels in neural imaging, or computers and routers in the Internet. A longstanding challenge in high-dimensional data is statistical inference in situations where the number of features is far greater than the number of samples in the data. Existing methodologies for testing the significance of a feature commonly rely on a uniform signal strength assumption: Each feature has either no effect or an effect stronger than an inflated noise level after adjustments for the uncertainty of the set of effective features. However, this uniform signal strength assumption is, unfortunately, seldom supported by either the data or the underlying science, especially in applications in biology, medicine, and communication and social networks. The proposed research will focus on a new approach to the above mentioned longstanding problem of statistical inference with high-dimensional data. It will develop practical methods, efficient algorithms, statistical software, and solid theory for test of significance and confidence regions for low-dimensional functions of features, even when the dimension of data is high. The methodologies developed in the proposed research will be directly relevant to common applications where modern information technologies prosper.
|Effective start/end date||7/1/12 → 6/30/15|
- National Science Foundation (NSF)