Stochastic epidemiological models

24 August 2023

Julien Arino

Department of Mathematics & Data Science Nexus
University of Manitoba*

Canadian Centre for Disease Modelling
PHAC-EMNID / REMMI-ASPC
NSERC-PHAC EID Modelling Consortium (CANMOD, MfPH, OMNI/RÉUNIS)

*The University of Manitoba campuses are located on original lands of Anishinaabeg, Cree, Oji-Cree, Dakota and Dene peoples, and on the homeland of the Métis Nation. We respect the Treaties that were made on these territories, we acknowledge the harms and mistakes of the past, and we dedicate ourselves to move forward in partnership with Indigenous communities in a spirit of reconciliation and collaboration.

Outline

  • Why stochasticity matters
  • Side note: sojourn / residence times
  • Discrete time Markov chains
  • Continuous time Markov chains

Remarks / Resources

This is a user-oriented course: I barely touch on the algorithms; instead, I focus on how to use them

Code is available in my subfolder in the course Github repo in the CODE directory

Some of the slides are inspired from slides given to me by Linda Allen (Texas Tech) and Frank Ball (University of Nottingham). I recommend books and articles by Linda for more detail

Why stochasticity matters

Running example - SIS model without demography

Constant total population

Basic reproduction number:

In the deterministic world, rules the world

  • If , the disease dies out (disease-free equilibrium)
  • If , it becomes established at an endemic equilibrium


    Next slides: K, , (and )

In stochastic world, make that '' rules-ish'' ()

center

When , extinctions happen quite frequently

Types of stochastic systems discussed today

  • Discrete-time Markov chains (DTMC)
  • Continuous-time Markov chains (CTMC)

But there are many others. Of note:

  • Branching processes (BP)
  • Stochastic differential equations (SDE)

Side note: sojourn / residence times

Some probability theory

Suppose that a system can be in two states, and

  • At time , system is in state
  • An event happens at some time , which triggers the switch from state to state

A random variable is a variable that takes random values, that is, a mapping from random experiments to numbers

Let us call the random variable

time spent in state before switching into state

States can be anything:

  • : working, : broken;
  • : infected, : recovered;
  • : alive, : dead;

We take a collection of objects or individuals in state and want some law for the distribution of the times spent in , i.e., a law for

For example, we make light bulbs and would like to tell our customers that on average, our light bulbs last 200 years..

For this, we conduct an infinite number of experiments, and observe the time that it takes, in every experiment, to switch between and

From this, deduce a model, which in this context is called a probability distribution

Discrete vs continuous random variables

We assume that is a continuous random variable, that is, takes continuous values. Examples of continuous r.v.:

  • height or age of a person (if measured very precisely)
  • distance
  • time

Another type of random variables are discrete random variables, which take values in a denumerable set. Examples of discrete r.v.:

  • heads or tails on a coin toss
  • the number rolled on a dice
  • height of a person, if expressed rounded without subunits, age of a person in years (without subunits)

Probability density function

Assume continuous; it has a continuous probability density function

center

Cumulative distribution function

The cumulative distribution function (c.d.f.) is a function that characterizes the distribution of , and defined by

center

Properties of the c.d.f.

  • Since is a nonnegative function, is nondecreasing
  • Since is a probability density function, , and thus

center

Mean value

For a continuous random variable with probability density function , the mean value of , denoted or , is given by

Survival function

Another characterization of the distribution of the random variable is through the survival (or sojourn) function

The survival function of state is given by

This gives a description of the sojourn time of a system in a particular state (the time spent in the state)

is a nonincreasing function (since with a c.d.f.), and (since is a positive random variable)

The average sojourn time is

Since ,

Expected future lifetime

Hazard (or failure) rate

The hazard rate (or failure rate) is

Gives probability of failure between and , given survival to

We have

The exponential distribution

The exponential distribution

The random variable has an exponential distribution if its probability density function takes the form

with . Then the survival function for state is of the form , for , and the average sojourn time in state is

The Dirac distribution

The Dirac distribution

If on the other hand, for some constant ,

which means that has a Dirac delta distribution , then the average sojourn time is

A cohort model

A model for a cohort with one cause of death

We consider a population consisting of individuals born at the same time (a cohort), for example, the same year

We suppose

  • At time , there are initially individuals
  • All causes of death are compounded together
  • The time until death, for a given individual, is a random variable , with continuous probability density distribution and survival function

The model

Denote the population at time . Then

gives the proportion of , the initial population, that is still alive at time

Case where is exponentially distributed

Suppose that has an exponential distribution with mean (or parameter ), . Then the survival function is and takes the form

Now note that

with

The ODE makes the assumption that the life expectancy at birth is exponentially distributed

Case where has a Dirac delta distribution

Suppose that has a Dirac delta distribution at , giving the survival function

Then takes the form

All individuals survive until time , then they all die at time

Here, we have everywhere except at , where it is undefined

Sojourn times in an SIS disease transmission model

An SIS with tweaked recovery

Traditional ODE models assume recovery from disease at per capita rate (often denoted )

Here, assume that, of the individuals who have become infective at time , a fraction remain infective at time

Thus, considered for , the function is a survival function

Reducing the dimension of the problem

We have

is constant (equal total population at time ), so we can deduce the value of , once we know , from the equation

Model for infectious individuals

Integral equation for the number of infective individuals:

  • number of individuals who were infective at time and still are at time
    • is nonnegative, nonincreasing, and such that
  • proportion of individuals who became infective at time and
    who still are at time
  • is with , from the reduction of dimension

Expression under the integral

Integral equation for the number of infective individuals:

The term

  • is the rate at which new infectives are created, at time ,
  • multiplying by gives the proportion of those who became infectives at time and who still are at time

Summing over gives the number of infective individuals at time

Case of an exponentially distributed time to recovery

Suppose that is such that the sojourn time in the infective state has an exponential distribution with mean , i.e.,

Then the initial condition function takes the form

with the number of infective individuals at time . This is obtained by considering the cohort of initially infectious individuals, giving a model such as

Equation becomes

Taking the time derivative of yields

which is the classical logistic type ordinary differential equation (ODE) for in an SIS model without vital dynamics (no birth or death)

Case of a step function survival function

Consider case where the time spent infected has survival function

i.e., the sojourn time in the infective state is a constant

In this case becomes

Here, it is more difficult to obtain an expression for . It is however assumed that vanishes for

When differentiated, gives, for

Since vanishes for , this gives the delay differential equation (DDE)

What we know this far

  • The time of sojourn in compartments plays an important role in determining the type of model that we deal with
  • All ODE compartmental models, when they use terms of the form , make the assumption that the time of sojourn in compartments is exponentially distributed with mean
  • At the other end of the spectrum, delay differential with discrete delay make the assumption of a constant sojourn time , equal for all individuals
  • Both can be true sometimes.. but reality is often somewhere in between

Survival function, , for exponential distrib. with mean 80 years

center

The problems with the exponential distribution

  • Survival drops quickly: in previous graph, 20% mortality of a cohort at age 20 years
  • Survival extends way past the mean: in previous graph, almost 25% survival to age 120 years
  • Acceptable if what matters is mean duration of sojourn over long time period
  • Less so if interested in short term dynamics
  • Exponential distribution with parameter has mean and variance , i.e., one parameter controls both the mean and dispersion

An COVID-19 model : "making Erlangs"

JA & Portet. A simple model for COVID-19. Infectious Disease Modelling 5:309-315 (2020)

Simple way to "fix" sojourn times: sums of exponential distributions

  • Exponential distribution of sojourn times is acceptable if what matters is mean duration of sojourn over long time period
  • For COVID-19, were trying to give "predictions" over 2-4 weeks period, so we need more than the mean

Use a property of exponential distributions, namely, that the sum of i.i.d. (independent and identically distributed) exponential distributions is Erlang distributed

Sum of exponential distributions

and independent exponential r.v. with rate parameters and . Then the p.d.f. of is the convolution

The Erlang distribution

P.d.f. of the Erlang distribution

shape parameter, rate parameter (sometimes use scale parameter )

So, if , has distribution

i.e., an Erlang distribution with shape parameter and rate parameter

Continuing..

, , be exponential i.i.d. random variables with parameter

Then Erlang distributed with rate parameter and shape parameter

So use multiple compartments

Discrete time Markov chains

Discrete-time Markov chains

: probability vector, with describing the probability that at time , the system is in state ,

for all , of course

State evolution governed by

where is a stochastic matrix (row sums all equal 1), the transition matrix, with entry , where sequence of random variables describing the state

If constant, homogeneous DTMC

Time often ''recast'' so that

Important remark

The DTMC world lives at the interface between probabilists, who like to think of as a row vector, as a column-stochastic matrix and thus write

and linear algebraists, who prefer column vectors and row-stochastic matrices,

So check the direction to understand whether you are using or

Advantages of DTMC

As a teacher of modelling: base theory of DTMC uses a lot of linear algebra and graph theory; usually really appreciated by students

Regular DTMC (with primitive transition matrices) allow to consider equilibrium distribution of probability of being in various states

Absorbing DTMC (with reducible transition matrices) allow the consideration of time to absorption, mean first passage time, etc.

DTMC for example SIS system

Since , consider only the infected. To simulate as DTMC, consider a random walk on ( Gambler's ruin problem)

Denote , and

Absorbing states, absorbing chains

A state in a Markov chain is absorbing if whenever it occurs on the generation of the experiment, it then occurs on every subsequent step. In other words, is absorbing if and for

A Markov chain is absorbing if it has at least one absorbing state, and if from every state it is possible to go to an absorbing state

In an absorbing Markov chain, a state that is not absorbing is called transient

Some questions on absorbing chains

Suppose we have a chain like the following

center

  1. Does the process eventually reach an absorbing state?
  2. Average # of times spent in a transient state, if starting in a transient state?
  3. Average # of steps before entering an absorbing state?
  4. Probability of being absorbed by a given absorbing state, when there are more than one, when starting in a given transient state?

Reaching an absorbing state

Answer to question 1:

Markov chain absorbing probability of reaching an absorbing state is 1

Standard form of the transition matrix

For an absorbing chain with absorbing states and transient states, the transition matrix can be written as

Absorbing states Transient states
Absorbing states
Transient states

the identity, , ,

The matrix is invertible. Let

  • be the fundamental matrix of the Markov chain
  • be the sum of the entries on row of

Answers to our remaining questions:

  1. is the average number of times the process is in the th transient state if it starts in the th transient state
  2. is the average number of steps before the process enters an absorbing state if it starts in the th transient state
  3. is the probability of eventually entering the th absorbing state if the process starts in the th transient state

See for instance book of Kemeny and Snell

Return to DTMC for example SIS system

Since , consider only the infected. To simulate as DTMC, consider a random walk on ( Gambler's ruin problem)


Denote , and

Transition matrix

# Make the transition matrix
T = mat.or.vec(nr = (Pop+1), nc = (Pop+1))
for (row in 2:Pop) {
  I = row-1
  mv_right = gamma*I*Delta_t # Recoveries
  mv_left = beta*I*(Pop-I)*Delta_t # Infections
  T[row,(row-1)] = mv_right
  T[row,(row+1)] = mv_left
}
# Last row only has move left
T[(Pop+1),Pop] = gamma*(Pop)*Delta_t
# Check that we don't have too large values
if (max(rowSums(T))>1) {
  T = T/max(rowSums(T))
}
diag(T) = 1-rowSums(T)

Simulating a DTMC

library(DTMCPack)
sol = MultDTMC(nchains = 20, tmat = T, io = IC, n = t_f)

gives 20 realisations of a DTMC with transition matrix T, initial conditions IC (a vector of initial probabilities of being in the different states) and final time t_f

See code on Github

Going a bit further

DTMCPack is great for obtaining realisations of a DTMC, but to study it in more detail, markovchain is much more comprehensive

library(markovchain)
mcSIS <- new("markovchain", 
             states = sprintf("I_%d", 0:Pop),
             transitionMatrix = T,
             name = "SIS")

Note that interestingly, markovchain overrides the weird default "* is Hadamard, %*% is usual" R matrix product rule, so mcSIS*mcSIS does , not

> summary(mcSIS)
SIS  Markov chain that is composed by: 
Closed classes: 
I_0 
Recurrent classes: 
{I_0}
Transient classes: 
{I_1,I_2,I_3,I_4,I_5,I_6,I_7,I_8,I_9,I_10,I_11,I_12,I_13,I_14,I_15,
I_16,I_17,I_18,I_19,I_20,I_21,I_22,I_23,I_24,I_25,I_26,I_27,I_28,
I_29,I_30,I_31,I_32,I_33,I_34,I_35,I_36,I_37,I_38,I_39,I_40,I_41,
I_42,I_43,I_44,I_45,I_46,I_47,I_48,I_49,I_50,I_51,I_52,I_53,I_54,
I_55,I_56,I_57,I_58,I_59,I_60,I_61,I_62,I_63,I_64,I_65,I_66,I_67,
I_68,I_69,I_70,I_71,I_72,I_73,I_74,I_75,I_76,I_77,I_78,I_79,I_80,
I_81,I_82,I_83,I_84,I_85,I_86,I_87,I_88,I_89,I_90,I_91,I_92,I_93,
I_94,I_95,I_96,I_97,I_98,I_99,I_100}
The Markov chain is not irreducible 
The absorbing states are: I_0
Function Role
absorbingStates absorbing states of the transition matrix, if any
steadyStates the vector(s) of steady state(s) in matrix form
meanFirstPassageTime matrix or vector of mean first passage times
meanRecurrenceTime vector of mean number of steps to return to each recurrent state
hittingProbabilities matrix of hitting probabilities for a Markov chain
meanAbsorptionTime expected number of steps for a transient state to be absorbed by any recurrent class
absorptionProbabilities transient states being absorbed by each recurrent state
canonicForm canonic form of transition matrix
period the period of an irreducible DTMC
summary DTMC summary

Continuous time Markov chains

Continuous-time Markov chains

CTMC similar to DTMC except in way they handle time between events (transitions)

DTMC: transitions occur each

CTMC: and transition times follow an exponential distribution parametrised by the state of the system

CTMC are roughly equivalent to ODE

Weight Transition Effect
, new infection of a susceptible
, recovery of an infectious

Will use and omit dynamics

Gillespie's algorithm (SIS model with only I eq.)

set and
while {}

  • Draw from
  • Draw from
  • Find such that
  • switch {}
    • 1: New infection,
    • 2: End of infectious period,

You can also use a package

  • You can implement Gillespie's algorithm yourself, it is a good exercise..
  • But in R there also exists a few packages allowing you to do that easily
  • They have the advantage of implementing tau-leaping (more on this later)

Simulating a CTMC

library(GillespieSSA2)
IC <- c(S = (Pop-I_0), I = I_0)
params <- c(gamma = gamma, beta = beta)
reactions <- list(
  reaction("beta*S*I", c(S=-1,I=+1), "new_infection"),
  reaction("gamma*I", c(S=+1,I=-1), "recovery")
)
set.seed(NULL)
sol <- ssa(
    initial_state = IC,
    reactions = reactions,
    params = params,
    method = ssa_exact(),
    final_time = t_f,
)
plot(sol$time, sol$state[,"I"], type = "l",
     xlab = "Time (days)", ylab = "Number infectious")

Sometimes in a CTMC things go bad

  • Recall that the inter-event time is exponentially distributed
  • Critical step of the Gillespie algorithm:
    • weight of all possible events (propensity)
    • Draw from
  • So the inter-event time if becomes very large for some
  • This can cause the simulation to grind to a halt

Example: a birth and death process

  • Individuals born at per capita rate
  • Individuals die at per capita rate
  • Let's implement this using classic Gillespie

Gillespie's algorithm (birth-death model)

set and
while {}

  • Draw from
  • Draw from
  • Find such that
  • switch {}
    • 1: Birth,
    • 2: Death,
b = 0.01   # Birth rate
d = 0.01   # Death rate
t_0 = 0    # Initial time
N_0 = 100  # Initial population

# Vectors to store time and state. Initialise with initial condition.
t = t_0
N = N_0

t_f = 1000  # Final time

# We'll track the current time and state (could also just check last entry in t
# and N, but will take more operations)
t_curr = t_0
N_curr = N_0
while (t_curr<=t_f) {
  xi_t = (b+d)*N_curr
  # The exponential number generator does not like a rate of 0 (when the 
  # population crashes), so we check if we need to quit
  if (N_curr == 0) {
    break
  }
  tau_t = rexp(1, rate = xi_t)
  t_curr = t_curr+tau_t
  v = c(b*N_curr, xi_t)/xi_t
  zeta_t = runif(n = 1)
  pos = findInterval(zeta_t, v)+1
  switch(pos,
         { 
           # Birth
           N_curr = N_curr+1
         },
         {
           # Death
           N_curr = N_curr-1
         })
  N = c(N, N_curr)
  t = c(t, t_curr)
}

Drawing at random from an exponential distribution

If you do not have an exponential distribution random number generator.. We want from , i.e., has probability density function

Use cumulative distribution function

which has values in . So draw from and solve for

Last one did not go well

  • Wanted 1000 time units (days?)
  • Interrupted around 500 () because I lost patience
  • (Slide before: the sim stopped because the population went extinct, I did not stop it!)
  • At stop time
    • (and as well, of course!)
    • time was moving slowly
> tail(diff(t))
[1] 1.357242e-04 1.291839e-04 5.926044e-05 7.344020e-05 1.401148e-04 4.423529e-04

Tau-leaping to the rescue!

  • Will not go into details
  • Approximation method (compared to classic Gillespie, which is exact)
  • Roughly: consider "groups" of events instead of individual events
  • Good news: GillespieSSA2 (which we saw earlier) and adaptivetau

Parallelisation

To see multiple realisations: good idea to parallelise, then interpolate results. Write a function, e.g., run_one_sim that .. runs one simulation, then..

no_cores <- detectCores()-1
cl <- makeCluster(no_cores)
clusterEvalQ(cl,{
  library(GillespieSSA2)
})
clusterExport(cl,
              c("params",
                "run_one_sim"),
              envir = .GlobalEnv)
SIMS = parLapply(cl = cl, 
                 X = 1:params$number_sims, 
                 fun =  function(x) run_one_sim(params))
stopCluster(cl)

See simulate_CTMC_parallel.R on Github

Benefit of parallelisation

Run the parallel code for 100 sims between tictoc::tic() and tictoc::toc(), giving 66.958 sec elapsed, then the sequential version

tictoc::tic()
SIMS = lapply(X = 1:params$number_sims, 
              FUN =  function(x) run_one_sim(params))
tictoc::toc()

which gives 318.141 sec elapsed on a 6C/12T Intel(R) Core(TM) i9-8950HK CPU @ 2.90GHz (4.75 faster) or 12.067 sec elapsed versus 258.985 sec elapsed on a 32C/64T AMD Ryzen Threadripper 3970X 32-Core Processor (21.46 faster !)

Merci / Miigwech / Thank you