Using Machine Learning to Estimate Unobserved COVID-19 Infections in North America.
Journal Articles
Overview
Research
Identity
Additional Document Info
View All
Overview
abstract
BACKGROUND: The detection of coronavirus disease 2019 (COVID-19) cases remains a huge challenge. As of April 22, 2020, the COVID-19 pandemic continues to take its toll, with >2.6 million confirmed infections and >183,000 deaths. Dire projections are surfacing almost every day, and policymakers worldwide are using projections for critical decisions. Given this background, we modeled unobserved infections to examine the extent to which we might be grossly underestimating COVID-19 infections in North America. METHODS: We developed a machine-learning model to uncover hidden patterns based on reported cases and to predict potential infections. First, our model relied on dimensionality reduction to identify parameters that were key to uncovering hidden patterns. Next, our predictive analysis used an unbiased hierarchical Bayesian estimator approach to infer past infections from current fatalities. RESULTS: Our analysis indicates that, when we assumed a 13-day lag time from infection to death, the United States, as of April 22, 2020, likely had at least 1.3 million undetected infections. With a longer lag time-for example, 23 days-there could have been at least 1.7 million undetected infections. Given these assumptions, the number of undetected infections in Canada could have ranged from 60,000 to 80,000. Duarte's elegant unbiased estimator approach suggested that, as of April 22, 2020, the United States had up to >1.6 million undetected infections and Canada had at least 60,000 to 86,000 undetected infections. However, the Johns Hopkins University Center for Systems Science and Engineering data feed on April 22, 2020, reported only 840,476 and 41,650 confirmed cases for the United States and Canada, respectively. CONCLUSIONS: We have identified 2 key findings: (1) as of April 22, 2020, the United States may have had 1.5 to 2.029 times the number of reported infections and Canada may have had 1.44 to 2.06 times the number of reported infections and (2) even if we assume that the fatality and growth rates in the unobservable population (undetected infections) are similar to those in the observable population (confirmed infections), the number of undetected infections may be within ranges similar to those described above. In summary, 2 different approaches indicated similar ranges of undetected infections in North America. LEVEL OF EVIDENCE: Prognostic Level V. See Instructions for Authors for a complete description of levels of evidence.