rm(list=ls())
library("deSolve")
# Aerobic degradation model using the generic standard notation
derivatives <- function(time, vars, pars) {
with(as.list(c(vars, pars)), { # 'dissolve' input vectors
# Vector of process rates
proc <- c(
degradation = kd * DO / (DO + h) * OM,
reaeration = ka * (DO_sat - DO)
)
# Stoichiometry matrix
stoi <- rbind(
degradation = c(OM= -1, DO= -s)[names(vars)],
reaeration = c(OM= 0, DO= 1)[names(vars)]
)
stoi <- stoi[names(proc),]
# Note how we use "names" to ensure that rows and columns appear
# in the proper order under all circumstances in the lines above.
# Return derivatives, process rates as extra output
list(
derivs= proc %*% stoi, # derivatives, note the %*% operator
rates=proc # extra output; reason for returning a list
)
}) # end of 'with' block
}
# Solve and visualize
initial <- c(OM=1, DO=8/32)
parameters <- c(kd=0.1, ka=0.1, s=1, h=1/32, DO_sat=8/32)
times <- 0:72
x <- lsoda(y=initial, times=times, func=derivatives, parms=parameters)
plot(x)System analysis: Efficient implementation of ODE-based models (R package rodeo)
1 Motivation
Real-world ecological or technical systems need to consider many interacting state variables affected by a large number of processes. Then, the corresponding models become more complex as well and their implementation becomes more demanding, because …
in spite of more variables and equations, the corresponding program code must remain transparent, easy to maintain, and reusable.
computational efficiency matters, especially for long-term simulations, parameter estimation, or uncertainty analysis.
In this session, we will learn advanced techniques to master these challenges.
2 A generic standard notation for simultaneous ODE
2.1 A concept adopted from chemical engineering
Fortunately, transparency and computational efficiency can achieved at the same time by adopting a standard notation for simultaneous ODE. The latter is used in chemical engineering for decades already. It became more popular in the biological sciences through activated sludge modeling (e.g., Henze et al., 1999), river water quality modeling (e.g., Reichert et al., 2001) or ecological lake modeling (e.g., Omlin et al., 2001).
2.2 Example: Aerobic degradation
Let’s consider an earlier example of the aerobic degradation of organic matter, now amended with a term accounting for oxygen limitation involving the half saturation constant \(h\) (Equation 1).
\[ \begin{aligned} \tfrac{d}{dt}OM &= -k_d \cdot \frac{DO}{DO + h} \cdot OM \\ \tfrac{d}{dt}DO &= -k_d \cdot \frac{DO}{DO + h} \cdot OM \cdot s + k_a \cdot (DO_{sat} - DO) \end{aligned} \tag{1}\]
The fundamental idea is to rewrite Equation 1 as a vector-matrix multiplication as in Equation 2. Note the superscript \(T\) which indicates transposition of the two involved vectors.
\[ \begin{bmatrix} \tfrac{d}{dt}OM \\ \tfrac{d}{dt}DO \end{bmatrix}^T = \begin{bmatrix} k_d \cdot \frac{DO}{DO + h} \cdot OM \\ k_a \cdot (DO_{sat} - DO) \end{bmatrix}^T \cdot \begin{bmatrix} -1, & -s \\ 0, & 1 \end{bmatrix} \tag{2}\]
As usual, the state variables’ derivatives appear on the left hand side, but now as a vector. The vector forming the first factor on the right hand side typically represents the individual processes. The second factor is known as the stoichiometry matrix. The latter connects the processes with the state variables derivatives. Each column of the matrix corresponds to a state variable, each row is associated with a process.
The signs of the stoichiometric coefficients allow for a simple interpretation: If a coefficient at position [\(i,k\)] is positive, the \(i\)-th process triggers an increase in the value of the \(k\)-th state variable. Likewise, negative coefficients indicate a decline of the variable’s value in response to the action of the process. Finally, a coefficient of zero indicates that the process does not have a direct impact on a variable (which does not exclude indirect effects).
2.3 Advantages of the notation
So why should we prefer the matrix-based notation of Equation 2 over Equation 1? There are several good reasons:
the matrix-based notation eliminates redundant terms. In the above example, the term \(-k_d \cdot \frac{DO}{DO + h} \cdot OM\) appears twice in Equation 1 but only once in Equation 2. First of all, the equations are simpler. Second, there may be a gain in computation speed as well.
the stoichiometry matrix helps to validate the model because it summarizes all interactions between processes and variables. It also helps to communicate the model concept (see, e.g., Fig. 2 of this paper).
the standard notation allows the model equations to be stored in tables (e.g. spreadsheets). This simplifies model documentation and allows for the automatic translation into source code. Hence, one can implement models of simultaneous ODE without actual programming (more on that later).
2.4 Using the notation with deSolve
The code below illustrates how the matrix-based notation can be used together with the numerical solvers from the deSolve package. Of particular importance is a small detail: the %*% operator. It is the key to matrix multiplication in R. The code’s output is displayed in Figure 1.
2.5 Summary of the notation
The above example should have made clear that the matrix-based standard notation is simple but helpful. I recommend to always use it because it has proven superiority over traditional, loose collections of ODE in almost all applications. The notation is summarized again in generic form in Equation 3 with processes denoted \(a\), \(b\), … \(z\) and state variables distinguished by numeric indices. Note again the transpose operators (\(T\)) allowing for a more compact notation with column vectors instead of row vectors.
\[ \begin{align} \begin{bmatrix} \tfrac{d}{dt}y_1 \\ \tfrac{d}{dt}y_2 \\ \ldots \\ \tfrac{d}{dt}y_n \end{bmatrix} ^T & = & \begin{bmatrix} a \\ b \\ \ldots \\ z \end{bmatrix} ^T & \cdot & \begin{bmatrix} s_{a,1}, & s_{a,2}, & \ldots, & s_{a,n} \\ s_{b,1}, & s_{b,2}, & \ldots, & s_{b,n} \\ \ldots, & \ldots, & \ldots, & \ldots \\ s_{z,1}, & s_{z,2}, & \ldots, & s_{z,n} \\ \end{bmatrix} \\ \text{Derivatives} & & \text{Process rates} & & \text{Stoichiometry matrix} \end{align} \tag{3}\]
3 Exploiting automatic code generation
3.1 Focus on ecology, not on programming!
Thanks to the deSolvepackage, implementing and solving ODE models in R is straightforward and convenient. However, we still need to care about many quite technical details, especially if we strive for safe and reusable code. Wouldn’t it be great if we could get rid of the programming overhead and spent more time on understanding the system’s behavior? It is possible!
One option is to go for simulation software with a graphical user interface allowing system models to be built in a visual manner. The AQUASIM software written by Peter Reichert at EAWAG is a great but legacy example.
Here, we will use a different approach known as code generation. With this concept, the model developer only provides the set of simultaneous ODE in standardized format (Equation 3). The implementation in computer code is the fully automatized.
3.2 The rodeo package
In this course, we will use the code generator implemented in the R package rodeo. In brief, this package allows all ‘ingredients’ of an ODE model to be imported in tabular format (using text files or spreadsheets. Among these ‘ingredients’ are the process rates and the stoichiometry matrix (see Equation 3) as well as the necessary declarations of symbols. All imported information is checked for integrity and processed into source code compatible with the dynamic and steady-state solvers from deSolve and rootSolve.
Since the rodeo version available on CRAN is typically a bit outdated, I recommend installing the latest snapshot from the project’s github repository. This can be done from within R / RStudio like so:
install.packages("remotes") # helper package for installation
remotes::install_github("dkneis/rodeo") # installs "rodeo" from github
library("rodeo") # loads the packageThe package provides both high level methods for teaching purposes and low level methods for advanced applications. In this course, we’ll focus on the high level methods.
If you want to get a better understanding of how the rodeo package can facilitate the implementation of a wide range of simultaneous ODE models, you can look at this article or study the package vignette which contains more examples (PDF file available here). The package has previously been applied, for example, to simulate bacterial plasmid transfer or predator-prey interactions.
3.3 Selected rodeo functions explained
3.3.1 Building a model
Building a model is equivalent to importing the ODE system in matrix format, together with a declaration of variables, parameters, and functions involved. After validation, the equations are translated into source code.
The rodeo packages provides two high-level functions for that purpose: buildFromWorkbook and buildFromText. The two differ with regard to the input file format only. buildFromWorkbook is made for reading spreadsheets while buildFromText imports all information from delimited text files.
?rodeo::buildFromWorkbook # opens the help page
?rodeo::buildFromTextA successful call to these functions returns a rodeo object. Unlike a classical variable, this objects contains data (equations, values of parameters, etc.) but also methods (functions to query and modify the data).
The format of the input files for buildFromWorkbook or buildFromText is best understood by looking at demonstration examples. One example is built into the rodeo package itself. It implements a common type of epidemiological model known as SEIR (see more in Section 4.1). You can use the following command to find the corresponding inputs for buildFromWorkbook or buildFromText on your local computer.
system.file("models", package="rodeo") # folder with example files3.3.2 Setting initial values and parameters
When either buildFromWorkbook or buildFromText is run with default arguments, parameters and state variables are initialized automatically. However, it is often desired to change those values later without rebuilding the model object.
Since the initial values of state variables and parameter values are an integral part of a rodeo object, they must be set using the respective class methods setVars and setPars. In typical applications, both methods require named numeric vectors as inputs.
?rodeo::setVars # opens the help page
?rodeo::setPars 3.3.3 Running a dynamic simulation
Dynamic simulations can be run by directly calling a solver like lsoda from the deSolve package. However, it is often simpler to call the dedicated class method dynamics. The latter is essentially a wrapper around lsoda that provides useful defaults for all arguments and hides more technical details from the normal user.
?rodeo::dynamics # opens the help page4 Sample model implemented with rodeo
4.1 Epidemic model of the SEIR family
The term SEIR model denotes a particular family of epidemiological models. They describe the dynamics of an infectious disease within a population consisting of four sub-groups (Table 1, Figure 2).
| Group | Variable symbol | Description |
|---|---|---|
| Susceptible group | S | Individuals who did not acquire the disease yet and are thus considered susceptible |
| Exposed group | E | Individuals who acquired the infection but who are not yet infectious. In those people, the pathogen undergoes incubation. |
| Infected group | I | Individuals in an advance state of infection who can transmit the pathogen to others. |
| Recovered group | R | Recovered individuals. They are assumed to be (and remain) immune against recurrent infection. |
The basic assumptions of behind SEIR models are described, e.g., in this article but also in many educational texts. Note that there are many variants of epidemiological models in addition to the simple SEIR. They either neglect aspects (e.g. the \(E\) group) or emphasize certain additional features like a gradual loss of immunity in the \(R\) group.
4.2 Model specification
Using the notation from Equation 3, the model is fully specified by the Table 2 and Table 3.
| type | name | unit | description | default |
|---|---|---|---|---|
| variable | S | individuals | susceptible | 8e+07 |
| variable | E | individuals | carrier, not yet infectious | 8e+05 |
| variable | I | individuals | infectious | 0e+00 |
| variable | R | individuals | recovered, immune | 0e+00 |
| parameter | t | 1/time | transmission rate constant | 1e+00 |
| parameter | i | 1/time | incubation rate constant | 2e-01 |
| parameter | r | 1/time | recovery rate constant | 4e-01 |
| name | unit | description | rate | S | E | I | R |
|---|---|---|---|---|---|---|---|
| transm | ind./day | transmission | t * I/(S+E+I+R) * S | -1 | 1 | 0 | 0 |
| incub | ind./day | incubation | i * E | 0 | -1 | 1 | 0 |
| recover | ind./day | recovery | r * I | 0 | 0 | -1 | 1 |
4.3 Storing the model in a spreadsheet workbook
Create a new workbook using spreadsheet software of your choice.
Add a sheet
declarationsto the workbook and fill in the contents of Table 2 (using copy-and-paste).Add a sheet
equationsto the workbook and fill in the contents of Table 3 (using copy-and-paste).Safe the workbook as
SEIR.xlsxin a folder of your choice.
Again, you could use the alternative workflow outlined above if you prefer plain text over spreadsheets.
4.4 Simulation of system dynamics
The R code below triggers a dynamic simulation of the SEIR model using the default parameters and initial values. Note that visualization (Figure 3) is based on a few lines of custom code to arrange all variables in a single plot.
rm(list=ls())
library("rodeo") # load package
workbook <- "models/SEIR.xlsx" # adjust as necessary
m <- buildFromWorkbook(workbook) # assemble model
x <- m$dynamics(times=0:90) # simulation with defaults
varnames <- m$namesVars() # custom plotting
plot(range(x[,"time"]), range(x[,varnames]),
type="n", xlab="Days", ylab="Individuals")
for (vn in varnames) {
lines(x[,"time"], x[,vn], col=match(vn, varnames))
}
legend("right", bty="n", lty=1, col=1:length(varnames),
legend=varnames)As illustrated by Figure 3, the SEIR model as implemented above predicts an end of the epidemic once most of the individuals had been infected and went through recovery. However, we know by experience that immunity acquired in response to an infection does not necessarily last forever.
4.5 Extending the model
In fact, we can easily modify the above system of ODE to account for a gradual loss of immunity in the group of recovered individuals. The extended version, which accounts for the possibility of recurrent infections is known as SEIRS (note the final ‘S’).
To switch from SEIR to SEIRS, we need to take the following actions:
We add a process to the model representing a transition of individuals from the pool of recovered (
R) back to the susceptible (S). For this, we append an additional row to theequationssheet.A suitable formulation of the process rate is
s * R. This is a classical first order term with the newly introduced rate constantscontrolling how fast immunity is lost.Since individuals are lost from the
Rpool and join theSpool, the corresponding stoichiometry factors are-1and+1.For completeness, the new parameter
smust be declared on the sheetdeclarations.
Thats it! A sample output generated with the extended model is displayed in Figure 4. In this example, the values of some of the rate constants have been adjusted to make the epidemic a bit more dynamic. Apparently, when immunity is lost after some time, the system converges to a new steady state where the numbers of exposed (E) and the infected (I) are non-zero. In addition, the transition to steady state is subject to damped oscillations.
workbook <- "models/SEIRS.xlsx" # adjust as necessary
m <- buildFromWorkbook(workbook) # assemble model
x <- m$scenarios(times=0:365, # scenarios illustrating
scenarios=list(SEIR=c(s=0), SEIRS=c(s=0.01)), # SEIR vs. SEIRS
plot.vars=TRUE, leg="bottomright")