Infection Fatality Fate of COVID-19 Inferred from Seroprevalence Data
What is the IFR of COVID-19? I see this asked multiple times per day. I also see social media posts daily stating wild COVID IFR values.
My present recommended go-to-source for COVID IFR values is a peer-reviewed paper from John P.A. Ioannidis, C.F. Rehnborg Chair in Disease Prevention, Professor of Medicine, of Epidemiology and Population Health, and (by courtesy) of Biomedical Data Science, and of Statistics; co-Director, Meta-Research Innovation Center at Stanford University (METRICS).
The paper formed part of the WHO Bulletin last month and was placed here https://www.who.int/bulletin/online_first/BLT.20.265892.pdf
However for reasons unknown, it disappears (404 server error) and then reappears, repeatedly for the past few days. To preserve it as sent by the WHO, I’ve reuploaded in its native PDF format, Infection fatality rate of COVID-19 inferred from seroprevalence data, and converted the first 19 (core) pages below to HTML.
Update
March 26th, 2021
Peer-reviewed paper ‘Reconciling estimates of global spread and infection fatality rates of COVID-19: an overview of systematic evaluations – John P.A. Ioannidis’.
Concludes “… the available evidence suggests average global IFR of ~0.15% and ~1.5-2.0 billion infections by February 2021 with substantial differences in IFR and in infection spread across continents, countries, and locations.”
Infection fatality rate of COVID-19 inferred from seroprevalence data
This online first version has been peer-reviewed, accepted and edited,
but not formatted and finalized with corrections from authors and proofreaders
John P A Ioannidisa
a Meta-Research Innovation Center at Stanford (METRICS), Stanford University, 1265 Welch Road, Stanford, California 94305, United States of America.
Correspondence to John P A Ioannidis (email: [email protected]).
(Submitted: 13 May 2020 – Revised version received: 13 September 2020 – Accepted: 15 September 2020 – Published online: 14 October 2020)
Abstract
Objective To estimate the infection fatality rate of coronavirus disease 2019 (COVID-19) from seroprevalence data.
Methods I searched PubMed and preprint servers for COVID-19 seroprevalence studies with a sample size ≥500 as of 9 September, 2020. I also retrieved additional results of national studies from preliminary press releases and reports. I assessed the studies for design features and seroprevalence estimates. I estimated the infection fatality rate for each study by dividing the number of COVID-19 deaths by the number of people estimated to be infected in each region. I corrected for the number of antibody types tested (immunoglobin, IgG, IgM, IgA).
Results I included 61 studies (74 estimates) and eight preliminary national estimates. Seroprevalence estimates ranged from 0.02% to 53.40%. Infection fatality rates ranged from 0.00% to 1.63%, corrected values from 0.00% to 1.54%. Across 51 locations, the median COVID-19 infection fatality rate was 0.27% (corrected 0.23%): the rate was 0.09% in locations with COVID-19 population mortality rates less than the global average (< 118 deaths/million), 0.20% in locations with 118–500 COVID-19 deaths/million people and 0.57% in locations with > 500 COVID-19 deaths/million people. In people < 70 years, infection fatality rates ranged from 0.00% to 0.31% with crude and corrected medians of 0.05%.
Conclusion The infection fatality rate of COVID-19 can vary substantially across different locations and this may reflect differences in population age structure and case- mix of infected and deceased patients and other factors. The inferred infection fatality rates tended to be much lower than estimates made earlier in the pandemic.
Introduction
EXPAND The infection fatality rate, the probability of dying for a person who is infected, is one of the most important features of the coronavirus disease 2019 (COVID-19) pandemic. The expected total mortality burden of COVID-19 is directly related to the infection fatality rate. Moreover, justification for various non-pharmacological public health interventions depends on the infection fatality rate. Some stringent interventions that potentially also result in more noticeable collateral harms1 may be considered appropriate, if the infection fatality rate is high. Conversely, the same measures may fall short of acceptable risk–benefit thresholds, if the infection fatality rate is low. Early data from China suggested a 3.4% case fatality rate2 and that asymptomatic infections were uncommon,3 thus the case fatality rate and infection fatality rate would be about the same. Mathematical models have suggested that 40–81% of the world population could be infected,4,5 and have lowered the infection fatality rate to 1.0% or 0.9%.5,6 Since March 2020, many studies have estimated the spread of the virus causing COVID-19 – severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) – in various locations by evaluating seroprevalence. I used the prevalence data from these studies to infer estimates of the COVID-19 infection fatality rate. EXPAND The input data for calculations of infection fatality rate were studies on the seroprevalence of COVID-19 done in the general population, or in samples that might approximately represent the general population (e.g. with proper reweighting), that had been published in peer-reviewed journals or as preprints (irrespective of language) as of 9 September 2020. I considered only studies with at least 500 assessed samples because smaller data sets would result in large uncertainty for any calculations based on these data. I included studies that made seroprevalence assessments at different time intervals if at least one time interval assessment had a sample size of at least 500 participants. If there were different eligible time intervals, I selected the one with the highest seroprevalence, since seroprevalence may decrease over time as antibody titres decrease. I excluded studies with data collected for more than a month that could not be broken into at least one eligible time interval less than one month duration because it would not be possible to estimate a point seroprevalence reliably. Studies were eligible regardless of the exact age range of participants included, but I excluded studies with only children. I also examined results from national studies from preliminary press releases and reports whenever a country had no other data presented in published papers of preprints. This inclusion allowed these countries to be represented, but information was less complete than information in published papers or preprints and thus requires caution. I included studies on blood donors, although they may underestimate seroprevalence and overestimate infection fatality rate because of the healthy volunteer effect. I excluded studies on health-care workers, since this group is at a potentially high exposure risk, which may result in seroprevalence estimates much higher than the general population and thus an improbably low infection fatality rate. Similarly, I also excluded studies on communities (e.g. shelters or religious or other shared-living communities). Studies were eligible regardless of whether they aimed to evaluate seroprevalence in large or small regions, provided that the population of reference in the region was at least 5000 people. I searched PubMed® (LitCOVID), and medRxiv, bioRxiv and Research Square using the terms “seroprevalence” OR “antibodies” with continuous updates. I made the first search in early May and did monthly updates, with the last update on 9 September, 2020. I contacted field experts to retrieve any important studies that may have been missed. From each study, I extracted information on location, recruitment and sampling strategy, dates of sample collection, sample size, types of antibody measured (immunoglobulin G (IgG), IgM and IgA), the estimated crude seroprevalence (positive samples divided by all samples tested), adjusted seroprevalence and the factors that the authors considered for adjustment. If a study did not cover an entire country, I collected information on the population of the relevant location from the paper or recent census data so as to approximate as much as possible the relevant catchment area (e.g. region(s) or county(ies)). Some studies targeted specific age groups (e.g. excluding elderly people and/or excluding children) and some estimated numbers of people infected in the population based on specific age groups. For consistency, I used the entire population (all ages) and, separately, the population 0–70 years to estimate numbers of infected people. I assumed that the seroprevalence would be similar in different age groups, but I also recorded any significant differences in seroprevalence across age strata so as to examine the validity of this assumption. I calculated the number of infected people by multiplying the relevant population size and the adjusted estimate of seroprevalence. If a study did not give an adjusted seroprevalence estimate, I used the unadjusted seroprevalence instead. When seroprevalence estimates with different adjustments were available, I selected the analysis with largest adjustment. The factors adjusted for included COVID-19 test performance, sampling design, and other factors such as age, sex, clustering effects or socioeconomic factors. I did not adjust for specificity in test performance when positive antibody results were already validated by a different method. For the number of COVID-19 deaths, I chose the number of deaths accumulated until the date 1 week after the midpoint of the study period (or the date closest to this that had available data) – unless the authors of the study had strong arguments to choose some other time point or approach. The 1-week lag accounts for different delays in developing antibodies versus dying from infection. The number of deaths is an approximation because it is not known when exactly each patient who died was infected. The 1-week cut-off after the study midpoint may underestimate deaths in places where patients are in hospital for a long time before death, and may overestimate deaths in places where patients die soon because of poor or even inappropriate care. Whether or not the health system became overloaded may also affect the number of deaths. Moreover, because of imperfect diagnostic documentation, COVID-19 deaths may have been both overcounted and undercounted in different locations and at different time points. I calculated the inferred infection fatality rate by dividing the number of deaths by the number of infected people for the entire population, and separately for people < 70 years. I took the proportion of COVID-19 deaths that occurred in people < 70 years old from situational reports for the respective locations that I retrieved at the time I identified the seroprevalence studies. I also calculated a corrected infection fatality rate to try and account for the fact that only one or two types of antibodies (among IgG, IgM, IgA) might have been used. I corrected seroprevalence upwards (and inferred infection fatality rate downwards) by one tenth of its value if a study did not measure IgM and similarly if IgA was not measured. This correction is reasonable based on some early evidence,7 although there is uncertainty about the exact correction factor. The estimates of the infection fatality rate across all locations showed great heterogeneity with I2 exceeding 99.9%; thus, a meta-analysis would be inappropriate to report across all locations. Quantitative synthesis with meta-analysis across all locations would also be misleading since locations with high COVID-19 seroprevalence would tend to carry more weight than locations with low seroprevalence. Furthermore, locations with more studies (typically those that have attracted more attention because of high death tolls and thus high infection fatality rates) would be represented multiple times in the calculations. In addition, poorly conducted studies with fewer adjustments would get more weight because of spuriously narrower confidence intervals than more rigorous studies with more careful adjustments which allow for more uncertainty. Finally, with a highly skewed distribution of the infection fatality rate and with large between-study heterogeneity, typical random effects models would produce an incorrectly high summary infection fatality rate that approximates the mean of the study-specific estimates (also strongly influenced by high-mortality locations where more studies have been done); for such a skewed distribution, the median is more appropriate. Therefore, in a first step, I grouped estimates of the infection fatality rate from studies in the same country (or for the United States of America, the same state) together and calculated a single infection fatality rate for that location, weighting the study-specific infection fatality rates by the sample size of each study. This approach avoided inappropriately giving more weight to studies with higher seroprevalence estimates and those with seemingly narrower confidence intervals because of poor or no adjustments, while still giving more weight to larger studies. Then, I used the single summary estimate for each location to calculate the median of the distribution of location-specific infection fatality rate estimates. Finally, I explored whether the location-specific infection fatality rates were associated with the COVID-19 mortality rate in the population (COVID-19 deaths per million people) in each location as of 12 September 2020; this analysis allowed me to assess whether estimates of the infection fatality rate tended to be higher in locations with a higher burden of death from COVID-19. EXPAND I retrieved 61 studies with 74 eligible estimates published either in the peer-reviewed literature or as preprints as of 9 September 2020.8–68 Furthermore, I also considered another eight preliminary national estimates.69–76 This search yielded a total of 82 eligible estimates (Fig. 1). The studies varied substantially in sampling and recruitment designs (Table 1; available at: http://www.who.int/bulletin/volumes/##/##/##-#####). Of the 61 studies, 24 studies8,10,16,17,20,22,25,33,34,36,37,42,46–49,52–54,61,63,65,68 explicitly aimed for random sampling from the general population. In principle, random sampling is a stronger design. However, even then, people who cannot be reached (e.g. by email or telephone or even by visiting them at a house location) will not be recruited, and these vulnerable populations are likely to be missed. Moreover, several such studies8,10,16,37,42 focused on geographical locations with high numbers of deaths, higher than other locations in the same city or country, and this emphasis would tend to select eventually for a higher infection fatality rate on average. Eleven studies assessed blood donors,12,15,18,24,28,31,41,44,45,55,60 which might underestimate COVID-19 seroprevalence in the general population. For example, 200 blood donors in Oise, France showed 3.00% seroprevalence, while the seroprevalence was 25.87% (171/661) in pupils, siblings, parents, teachers and staff at a high school with a cluster of cases in the same area; the true population seroprevalence may be between these two values.13 For other studies, healthy volunteer bias19 may underestimate seroprevalence, attracting people with symptoms26 may overestimate seroprevalence, and studies of employees,14,21,25,32,66 grocery store clients23 or patient cohorts11,14,27–30,36,38,40,50,51,56,59,62,64,67 risk sampling bias in an unpredictable direction. All the studies tested for IgG antibodies but only about half also assessed IgM and few assessed IgA. Only seven studies assessed all three types of antibodies and/or used pan-Ig antibodies. The ratio of people sampled versus the total population of the region was more than 1:1000 in 20 studies (Table 2; available at: http://www.who.int/bulletin/volumes/##/##/##-######). Seroprevalence for the infection ranged from 0.02% to 53.40% (58.40% in the slum sub- population in Mumbai; Table 3). Studies varied considerably depending on whether or not they tried to adjust their estimates for test performance, sampling (to get closer to a more representative sample), clustering (e.g. when including household members) and other factors. The adjusted seroprevalence occasionally differed substantially from the unadjusted value. In studies that used samples from multiple locations, between-location heterogeneity was seen (e.g. 0.00–25.00% across 133 Brazilian cities).25 Inferred infection fatality rate estimates varied from 0.00% to 1.63% (Table 4). Corrected values also varied considerably (0.00–1.54%). For 15 locations, more than one estimate of the infection fatality rate was available and thus I could compare the infection fatality rate from different studies evaluating the same location. The estimates of infection fatality rate tended to be more homogeneous within each location, while they differed markedly across locations (Fig. 2). Within the same location, infection fatality rate estimates tend to have only small differences, even though it is possible that different areas within the same location may also have real differences in infection fatality rate. France is one exception where differences are large, but both estimates come from population studies of outbreaks from schools and thus may not provide good estimates of population seroprevalence and may lead to an underestimated infection fatality rate. I used summary estimates weighted for sample size to generate a single estimate for each location. Data were available for 51 different locations (including the inferred infection fatality rates from the eight preliminary additional national estimates in Table 5). The median infection fatality rate across all 51 locations was 0.27% (corrected 0.23%). For people < 70 years old, the infection fatality rate of COVId-19 across 40 locations with available data ranged from 0.00% to 0.31% (median 0.05%); the corrected values were similar. EXPAND The infection fatality rate is not a fixed physical constant and it can vary substantially across locations, depending on the population structure, the case-mix of infected and deceased individuals and other, local factors. The studies analysed here represent 82 different estimates of the infection fatality rate of COVID-19, but they are not fully representative of all countries and locations around the world. Most of the studies are from locations with overall COVID-19 mortality rates that are higher than the global average. The inferred median infection fatality rate in locations with a COVID-19 mortality rate lower than the global average is low (0.09%). If one could sample equally from all locations globally, the median infection fatality rate might be even substantially lower than the 0.23% observed in my analysis. COVID-19 has a very steep age gradient for risk of death.80 Moreover, many, and in some cases most, deaths in European countries that have had large numbers of cases and deaths81 and in the USA82 occurred in nursing homes. Locations with many nursing home deaths may have high estimates of the infection fatality rate, but the infection fatality rate would still be low among non- elderly, non-debilitated people. Within China, the much higher infection fatality rate estimates in Wuhan compared with other areas of the country may reflect widespread nosocomial infections,83 as well as unfamiliarity with how to manage the infection as the first location that had to deal with COVID-19. The very many deaths in nursing homes, nosocomial infections and overwhelmed hospitals may also explain the high number of fatalities in specific locations in Italy84 and New York and neighbouring states.23,27,35,56 Poor decisions (e.g. sending COVID-19 patients to nursing homes), poor management (e.g. unnecessary mechanical ventilation) and hydroxychloroquine may also have contributed to worse outcomes. High levels of congestion (e.g. in busy public transport systems) may also have exposed many people to high infectious loads and, thus, perhaps more severe disease. A more aggressive viral clade has also been speculated.85 The infection fatality rate may be very high among disadvantaged populations and settings with a combination of factors predisposing to higher fatalities.37 Very low infection fatality rates seem common in Asian countries.8,11,29,48,49,51,59,61,67 A younger population in these countries (excluding Japan), previous immunity from exposure to other coronaviruses, genetic differences, hygiene etiquette, lower infectious load and other unknown factors may explain these low rates. The infection fatality rate is low also in low-income countries in both Asia and Africa,44,49,66,67 perhaps reflecting the young age-structure. However, comorbidities, poverty, frailty (e.g. malnutrition) and congested urban living circumstances may have an adverse effect on risk and thus increase infection fatality rate. Antibody titres may decline with time10,28,32,86,87 and this would give falsely low prevalence estimates. I considered the maximum seroprevalence estimate when multiple repeated measurements at different time points were available, but even then some of this decline cannot be fully accounted for. With four exceptions,10,28,32,51 the maximum seroprevalence value was at the latest time point. Positive controls for the antibody assays used were typically symptomatic patients with positive polymerase chain reaction tests. Symptomatic patients may be more likely to develop antibodies.87–91 Since seroprevalence studies specifically try to reveal undiagnosed asymptomatic and mildly symptomatic infections, a lower sensitivity for these mild infections could lead to substantial underestimates of the number of infected people and overestimate of the inferred infection fatality rate. A main issue with seroprevalence studies is whether they offer a representative picture of the population in the assessed region. A generic problem is that vulnerable people at high risk of infection and/or death may be more difficult to recruit in survey-type studies. COVID-19 infection is particularly widespread and/or lethal in nursing homes, in homeless people, in prisons and in disadvantaged minorities.92 Most of these populations are very difficult, or even impossible, to reach and sample and they are probably under-represented to various degrees (or even entirely missed) in surveys. This sampling obstacle would result in underestimating the seroprevalence and overestimating infection fatality rate. In principle, adjusted seroprevalence values may be closer to the true estimate, but the adjustments show that each study alone may have unavoidable uncertainty and fluctuation, depending on the type of analysis chosen. Furthermore, my corrected infection fatality rate estimates try to account for undercounting of infected people when not all three antibodies (IgG, IgM and IgA) were assessed. However, the magnitude of the correction is uncertain and may vary in different circumstances. An unknown proportion of people may have responded to the virus using immune mechanisms (mucosal, innate, cellular) without generating any serum antibodies.93–97 A limitation of this analysis is that several studies included have not yet been fully peer- reviewed and some are still ongoing. Moreover, despite efforts made by seroprevalence studies to generate estimates applicable to the general population, representativeness is difficult to ensure, even for the most rigorous studies and despite adjustments made. Estimating a single infection fatality rate value for a whole country or state can be misleading, when there is often huge variation in the population mixing patterns and pockets of high or low mortality. Furthermore, many studies have evaluated people within restricted age ranges, and the age groups that are not included may differ in seroprevalence. Statistically significant, modest differences in seroprevalence across some age groups have been observed in several studies.10,13,15,23,27,36,38 Lower values have been seen in young children and higher values in adolescents and young adults, but these patterns are inconsistent and not strong enough to suggest major differences extrapolating across age groups. Acknowledging these limitations, based on the currently available data, one may project that over half a billion people have been infected as of 12 September, 2020, far more than the approximately 29 million documented laboratory-confirmed cases. Most locations probably have an infection fatality rate less than 0.20% and with appropriate, precise non-pharmacological measures that selectively try to protect high-risk vulnerable populations and settings, the infection fatality rate may be brought even lower. METRICS has been supported by a grant from the Laura and John Arnold Foundation. I am a co-author (not principal investigator) of one of the seroprevalence studies. EXPANDMethods
Seroprevalence studies
Inferred infection fatality rate
Data synthesis
Results
Seroprevalence studies
Seroprevalence estimates
Inferred infection fatality rate
Most data came from locations with high death tolls from COVID-19 and 32 of the locations had a population mortality rate (COVID-19 deaths per million population) higher than the global average (118 deaths from COVID-19 per million as of 12 September 2020;79 Fig. 3). Uncorrected estimates of the infection fatality rate of COVID-19 ranged from 0.01% to 0.67% (median 0.10%) across the 19 locations with a population mortality rate for COVID-19 lower than the global average, from 0.07% to 0.73% (median 0.20%) across 17 locations with population mortality rate higher than the global average but lower than 500 COVID-19 deaths per million, and from 0.20% to 1.63% (median 0.71%) across 15 locations with more than 500 COVID-19 deaths per million. The corrected estimates of the median infection fatality rate were 0.09%, 0.20% and 0.57%, respectively, for the three location groups.Discussion
Funding:
Competing interests:
References
[Russian]. Available from: https://www.interfax.ru/russia/712617 [cited 2020 Aug 12].