An in-depth look at four academic models of the Wuhan coronavirus outbreak’s spread

With the Wuhan coronavirus epidemic still spreading throughout China and to an increasing number of countries around the world, epidemiologists are racing against time to simulate the progress of the disease’s spread. These simulations will not only yield more accurate predictions of future events, but they’ll give helpful guidance to public health authorities on how to check the disease’s spread.

I’ve taken a deep dive into four different modeling approaches from universities in Germany, the United States, and Lancaster, England, and the WHO team in London, to compare their methods and results, and give an early window into what they might mean.

Althaus et al: stochastic simulations from Switzerland

The first paper we’ll discuss is from Julien Riou and Christian Althaus at the Institute of Social and Preventive Medicine at the University of Bern, Switzerland.

This team takes the approach of stochastic simulation, modeling the disease down to the level of individual cases in millions of individual simulations, where each individual case behaves separately within the parameters of the model, and then finding the percentage of simulations that fit observed outcomes for different sets of parameters.

Their model only addresses cases within Wuhan itself, and does not separately model different cities or countries.

All of their models make the following assumptiosn:

  1. The number of initial cases infected zoonotically in the Huanan market was between one and fifty
  2. Hosts were infected between November 20 and December 4
  3. Each case takes between 7 and 14 days to transmit to other people (i.e. the latent period and incubation period are both variable and identical between 7 and 14 days).

They validate the models by checking whether the number of cases on January 18 (both symptomatic and not) is between 1,000 and 9,700 (whereas on that date only 121 cases were known), i.e. it embeds the assumption that as of January 18 only between 1.3% and 8% of cases had been detected (known as the ascertainment rate). This assumption reflects WHO estimates based on their own epiedmiological modeling.

The parameters the model attempted to measure were two:

  • Ro: the basic reproductive number for an average case
  • k: the dispersion parameter, which is the inverse of the variability in the number of transmissions per person.

A high value of k means that every patient infects approximately the same number of people, while very low values of k mean that a small number of “superspreaders” infect a disproportionate number of people.

One superspreader is believed to be responsible for infecting 14 healthcare workers in a single hospital in Wuhan, one of whom, an elderly doctor, recently died.

The German team finds that the percentage of simulations whose results match the estimated case numbers is highest for a broad range of Ro figures, from 1.4 to 3.8. This is one of the widest published ranges of Ro figures, and the one (along with Read, discussed below) on which we originally reported.

These values could place the transmissibility of 2019-nCoV nearly anywhere in the broad range also occupied by SARS and the 1918 Spanish flu, but definitely above MERS. They also find a very wide variety of values of k consistent with their observation, and are unable to rule out superspreader events entirely, but are able to rule out extremely low values associated with primarily super-spreader driven epidemiology.

Notable strengths of this paper are its attempt to model k, which no other paper seems to have done, and its acknowledgement of uncertainty in the date of onset, which neither Read nor Majumdar do. Notable weaknesses of this paper relative to Read and Majumdar, discussed below, include that it only fits to one data point from the entire time series, only models the epidemic in Wuhan itself, and does not account for the effectiveness of any efforts to fight the outbreak.

Majumder et al: a simple statistical model fit from Harvard

The second paper we’ll discuss is from the group of Maimuna S. Majumdar at Harvard Children’s Hospital. It takes the entire time series of case data from Wuhan alone, and fits it to a simple single function called the IDEA model.

This single function, basically an exponential whose rate of growth falls as the epidemic rises, is an empirical fit to the trajectories of other viral outbreaks like Zika and SARS, and a common but simple epidemiology model.

There are only two parameters for the model: the Ro value, and a value called d that sums up the effectiveness of epidemic control over time.

The model contains the assumption that the ascertainment rate is constant, and their model is not capable of estimating it (unlike Read’s model). They also assume that the latent and incubation periods are both uniform for all cases and somewhere between 6 and 10 days.

The result of fitting this model to the case data from Wuhan is an estimate that Ro lies between 2.0, if the latency period of the virus is only 6 days, and 3.3, if the latency were as long as 10 days. For a given rate of growth, longer latencies correspond to higher Ro values, and shorter latencies, to lower Ro values. In all cases, their model fit finds d equal to almost zero, i.e. nothing that the Chinese government and public did since the beginning of the outbreak has yet reduced the rate of growth.

This model makes no attempt to estimate the ascertainment rate, but does assume that it has remained constant over the course of the epidemic so far.

Read et al: a diffusion model from England

The third paper we’ll discuss comes from the group of Jonathan M. Read in the Centre for Health Informatics, Computing and Statistics at Lancaster Medical School in Lancaster, England, and collaborators in Florida and Glasgow.

This model is possibly the most sophisticated of the three, and makes an effort to estimate diffusion to other cities in China and countries around the world, as well as estimate the ascertainment rate.

This model has produced by far the bleakest results of any public work, and some of its scariest figures have gone viral on social media, e.g. an Ro value as high as 4.0, the topline estimate that Wuhan might face up to a quarter million cases by February 4, and that the USA might face five new importations of the coronavirus per day within the same timeframe.

Since publication, Dr. Read has announced that an update of the model with new data (and possibly improved methods) has painted a much less catastrophic picture. However, it still has not been published, so we are analyzing the current version of the paper. Please bear in mind it does not represent Dr. Read’s team’s latest work, and may generally be considered the most pessimistic assessment of any responsible academic group so far.

The paper is based on a diffusion model, which simulates separate growth of the epidemic in every city in China and every foreign country, based on an assumption of a simple epidemiological growth model in each city, and a frequency of diffusion of population between them based on travel statistics from 2017, assuming business-as-usual travel and sick people being no more or less likely to travel than anyone else.

The paper makes a lot of assumptions:

  • A four day incubation period
  • A four day latent period
  • 24 zoonotic infections from the Huanan market on exactly January 1
  • No human-human transmission prior to January 1
  • No new zoonotic infections since January 1
  • A fixed rate of transmission over a fixed contagious period per case
  • A fixed ascertainment rate

They fit the published confirmed case numbers from all cities/countries for all days to estimate the parameters of the model, and from this find the combination of parameters (Ro, ascertainment rate, and infectious rate and period) that maximize fit.

What they find is astonishing: an Ro value of 3.6-4.0, the highest estimate of any group, even with the shortest assumed latent period of any group.

This leads to a projection of a very high rate of growth. Their model estimates that after becoming infectious, patients remain infectious for only four days, and infect almost one new person per day in the absence of protective measured.

They also estimate that the ascertainment rate is only 5%, i.e. that 95% of cases are still unaccounted for and that Wuhan has almost 30,000 cases now. This estimate is near the center of the range from the WHO simulation.

In sum, the Read group found the highest-end Ro number anyone has found, by assuming the shortest latent period. Their simulation shows amazingly rapid growth of the epidemic.

Accordingly, they project that if this is true, by February 4, Wuhan may have a quarter of a million cases, while Chengdu, Guangzhou, Shanghai, and Shenzhen may have hundreds of cases each. Meanwhile, they project that countries like the USA, South Korea, the UK and Japan may each be facing between 5 and 15 new importations of the virus each day.

By contrast, both Majumdar and Althous find lower Ro values with a longer latent period than the (old) Read paper. So, the simple model fit in Majumdar and the stochastic simulation in Althous are both showing a much more rosy scenario than the diffusion model in Read.

It could be that the Read paper, by taking into account the sudden diffusion of cases to many areas, is shining a light on a mass of undiscovered cases in Wuhan. Or it could be that by using business-as-usual travel numbers from an annual average, Read is underestimating the contribution to the virus’s spread that is made by the heightened Lunar New Year travel season, infected people fleeing Wuhan in panic, or some combination of these factors.

Regardless of this, the Read group’s use of data from all cities and countries, its attempt to model the disease time course (even if it assumes a fixed value for the latency period), and its attempt to model the ascertainment rate (even if it assumes constancy) are all notable methodological strengths. It will be extremely interesting to see the updates in the forthcoming new paper from the Read group.

The WHO model: Stochastic simulations estimating Ro, k, and number of zoonotic infections

The fourth model comes from the WHO team at Imperial College in London, led by Neil M. Ferguson.

They use a stochastic simulation model similar to the approach from Althaus to estimate Ro, k, and the number of zoonotic infections which generate case numbers corresponding to WHO estimates.  They assume a latent and incubation period similar to the SARS virus, modeling two possibilities of 6.8 and 8.4 days, which they consider upper and lower estimates, and which both yield similar conclusions.

They find that the lowest number of zoonotic cases is most consistent with the evidence (less than 40 cases), while k estimates appear broadly compatible with SARS (as Althaus also reported), and Ro may range from 1.7 all the way to 4.3, but with likely values around 2.6 to 3.1.

This analysis, fitting the WHO’s public health purposes, seems mainly geared toward estimating the level of reduction in transmission that will be necessary to control the epidemic, and providing a toolkit for the authorities to use in coming weeks to estimate the effectiveness of existing measures.

Overall summary: no academic consensus on Ro, no attempt to model the latent period

Other than the brightness and dedication of these researchers, two things are clear from these models, taken as a whole.

First, there is still not an academic consensus on the basic replication number of the Wuhan coronavirus. Models range from finding an Ro of 1.4 after assuming a latent period of 14 days, to finding one of 4.0 after assuming only 4 days.

Second, the modeling community as a whole has not yet seriously begun attempting to model the latent period or incubation period of the virus, or the relationship between them, which are extremely important parameters which will influence the whole course of the epidemic.  Models are largely taking these parameters as fixed, and they are using a wide variety of values.

We will wait eagerly for further contributions from these and other groups around the world, as new data arrives and new methods and conceptual contributions are brought to bear on this urgent life and death issue.