Project Details
Description
Enormous amounts of biomedical data are generated by hospitals, but most of this data is available only
after people become ill. For people with chronic diseases such as diabetes, though, many important events
happen outside of the medical system. Patient generated health data (PGHD) can provide detailed insight
into an individual's health during daily life. With longterm continuous glucose data, activity data, and food
logs, we could develop personalized models of how factors affect blood glucose and deliver personalized
guidance to patients on how to better manage it. Transforming PGHD into information to guide decisions is
a highly general problem that applies to all forms of diabetes, and other chronic diseases. We specifically
focus on identifying dietary and lifestyle risk factors for gestational diabetes mellitus (GDM). GDM occurs in
9% of pregnancies, and leads to a 7-fold increase in Type 2 Diabetes risk after birth, making it a significant
public health problem. Pregnancy provides an ideal test bed for methods designed to make use of PGHD
and uncover causes, as outcomes can be captured in a limited study duration. Motivated by trying to find
causes and effects of nutrition in pregnancy, we develop generalizable algorithms that address widespread
challenges in the use of PGHD for causal inference. First, existing causal inference methods assume we
have well-defined variables (e.g. bodyweight), but nutrition can be measured in many ways (calories,
macronutrients, food groups). This puts a large burden on users, and limits the potential for data-driven
inference. We introduce the first causal inference algorithm that automatically identifies optimal variable
granularity for each relationship, by leveraging ontologies. This allows identification of different effects
between, say, protein and specific meats on health outcomes, without users needing to specify such
hypotheses. Second, while individual level data is essential for personalized inference, only limited data
may be available when a treatment decision must be made or when health status is changing over time,
such as during pregnancy. Leveraging population data can yield more accurate inferences, but existing
methods are unable to identify relevant data dynamically and pregnant individuals may be more similar to
others at the same stage of pregnancy than to themselves in the recent past. We introduce new methods
for dynamic causal transfer learning that continually identify and adapt relevant population data for
personalized causal inference. We initially test our approach on publicly available ICU, diabetes, and
nutrition datasets, before collecting a unique dietary and activity dataset from 150 pregnant individuals.
Status | Finished |
---|---|
Effective start/end date | 8/1/19 → 4/30/23 |
Funding
- U.S. National Library of Medicine: $282,110.00
- U.S. National Library of Medicine: $1.00
- U.S. National Library of Medicine: $300,000.00
Fingerprint
Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.