rm(list=ls())
library("deSolve")
# Aerobic degradation model using the generic standard notation
<- function(time, vars, pars) {
derivatives with(as.list(c(vars, pars)), { # 'dissolve' input vectors
# Vector of process rates
<- c(
proc degradation = kd * DO / (DO + h) * OM,
reaeration = ka * (DO_sat - DO)
)
# Stoichiometry matrix
<- rbind(
stoi degradation = c(OM= -1, DO= -s)[names(vars)],
reaeration = c(OM= 0, DO= 1)[names(vars)]
)<- stoi[names(proc),]
stoi # Note how we use "names" to ensure that rows and columns appear
# in the proper order under all circumstances in the lines above.
# Return derivatives, process rates as extra output
list(
derivs= proc %*% stoi, # derivatives, note the %*% operator
rates=proc # extra output; reason for returning a list
)
# end of 'with' block
})
}
# Solve and visualize
<- c(OM=1, DO=8/32)
initial <- c(kd=0.1, ka=0.1, s=1, h=1/32, DO_sat=8/32)
parameters <- 0:72
times
<- lsoda(y=initial, times=times, func=derivatives, parms=parameters)
x plot(x)
System analysis: Efficient implementation of ODE-based models
\(\leftarrow\) BACK TO TEACHING SECTION OF MY HOMEPAGE
1 Motivation
Real-world ecological or technical systems need to consider many interacting state variables affected by a large number of processes. Then, the corresponding models become more complex as well and their implementation becomes more demanding, because …
in spite of more variables and equations, the corresponding program code must remain transparent, easy to maintain, and reusable.
computational efficiency matters, especially for long-term simulations, parameter estimation, or uncertainty analysis.
In this session, we will learn advanced techniques to master these challenges.
2 A generic standard notation for simultaneous ODE
2.1 A concept adopted from chemical engineering
Fortunately, transparency and computational efficiency can achieved at the same time by adopting a standard notation for simultaneous ODE. The latter is used in chemical engineering for decades already. It became more popular in the biological sciences through activated sludge modeling (e.g., Henze et al., 1999), river water quality modeling (e.g., Reichert et al., 2001) or ecological lake modeling (e.g., Omlin et al., 2001).
2.2 Example: Aerobic degradation
Let’s consider an earlier example of the aerobic degradation of organic matter, now amended with a term accounting for oxygen limitation involving the half saturation constant \(h\) (Equation 1).
\[ \begin{aligned} \tfrac{d}{dt}OM &= -k_d \cdot \frac{DO}{DO + h} \cdot OM \\ \tfrac{d}{dt}DO &= -k_d \cdot \frac{DO}{DO + h} \cdot OM \cdot s + k_a \cdot (DO_{sat} - DO) \end{aligned} \tag{1}\]
The fundamental idea is to rewrite Equation 1 as a vector-matrix multiplication as in Equation 2. Note the superscript \(T\) which indicates transposition of the two involved vectors.
\[ \begin{bmatrix} \tfrac{d}{dt}OM \\ \tfrac{d}{dt}DO \end{bmatrix}^T = \begin{bmatrix} k_d \cdot \frac{DO}{DO + h} \cdot OM \\ k_a \cdot (DO_{sat} - DO) \end{bmatrix}^T \cdot \begin{bmatrix} -1, & -s \\ 0, & 1 \end{bmatrix} \tag{2}\]
As usual, the state variables’ derivatives appear on the left hand side, but now as a vector. The vector forming the first factor on the right hand side typically represents the individual processes. The second factor is known as the stoichiometry matrix. The latter connects the processes with the state variables derivatives. Each column of the matrix corresponds to a state variable, each row is associated with a process.
The signs of the stoichiometric coefficients allow for a simple interpretation: If a coefficient at position [\(i,k\)] is positive, the \(i\)-th process triggers an increase in the value of the \(k\)-th state variable. Likewise, negative coefficients indicate a decline of the variable’s value in response to the action of the process. Finally, a coefficient of zero indicates that the process does not have a direct impact on a variable (which does not exclude indirect effects).
2.3 Advantages of the notation
So why should we prefer the matrix-based notation of Equation 2 over Equation 1? There are several good reasons:
the matrix-based notation eliminates redundant terms. In the above example, the term \(-k_d \cdot \frac{DO}{DO + h} \cdot OM\) appears twice in Equation 1 but only once in Equation 2. First of all, the equations are simpler. Second, there may be a gain in computation speed as well.
the stoichiometry matrix helps to validate the model because it summarizes all interactions between processes and variables. It also helps to communicate the model concept (see, e.g., Fig. 2 of this paper).
the standard notation allows the model equations to be stored in tables (e.g. spreadsheets). This simplifies model documentation and allows for the automatic translation into source code. Hence, one can implement models of simultaneous ODE without actual programming (more on that later).
2.4 Using the notation with deSolve
The code below illustrates how the matrix-based notation can be used together with the numerical solvers from the deSolve
package. Of particular importance is a small detail: the %*%
operator. It is the key to matrix multiplication in R. The code’s output is displayed in Figure 1.
2.5 Summary of the notation
The above example should have made clear that the matrix-based standard notation is simple but helpful. I recommend to always use it because it has proven superiority over traditional, loose collections of ODE in almost all applications. The notation is summarized again in generic form in Equation 3 with processes denoted \(a\), \(b\), … \(z\) and state variables distinguished by numeric indices. Note again the transpose operators (\(T\)) allowing for a more compact notation with column vectors instead of row vectors.
\[ \begin{align} \begin{bmatrix} \tfrac{d}{dt}y_1 \\ \tfrac{d}{dt}y_2 \\ \ldots \\ \tfrac{d}{dt}y_n \end{bmatrix} ^T & = & \begin{bmatrix} a \\ b \\ \ldots \\ z \end{bmatrix} ^T & \cdot & \begin{bmatrix} s_{a,1}, & s_{a,2}, & \ldots, & s_{a,n} \\ s_{b,1}, & s_{b,2}, & \ldots, & s_{b,n} \\ \ldots, & \ldots, & \ldots, & \ldots \\ s_{z,1}, & s_{z,2}, & \ldots, & s_{z,n} \\ \end{bmatrix} \\ \text{Derivatives} & & \text{Process rates} & & \text{Stoichiometry matrix} \end{align} \tag{3}\]
3 Exploiting automatic code generation
3.1 Focus on ecology, not on programming!
Thanks to the deSolve
package, implementing and solving ODE models in R is straightforward and convenient. However, we still need to care about many quite technical details, especially if we strive for safe and reusable code. Wouldn’t it be great if we could get rid of the programming overhead and spent more time on understanding the system’s behavior? It is possible!
One option is to go for simulation software with a graphical user interface allowing system models to be built in a visual manner. The AQUASIM software written by Peter Reichert at EAWAG is a great but legacy example.
Here, we will use a different approach known as code generation. With this concept, the model developer only provides the set of simultaneous ODE in standardized format (Equation 3). The implementation in computer code is the fully automatized.
3.2 The rodeo
package
In this course, we will use the code generator implemented in the R package rodeo. In brief, this package allows all ‘ingredients’ of an ODE model to be imported in tabular format (using text files or spreadsheets. Among these ‘ingredients’ are the process rates and the stoichiometry matrix (see Equation 3) as well as the necessary declarations of symbols. All imported information is checked for integrity and processed into source code compatible with the dynamic and steady-state solvers from deSolve and rootSolve.
Since the rodeo
version available on CRAN is typically a bit outdated, I recommend installing the latest snapshot from the project’s github repository. This can be done from within R / RStudio like so:
install.packages("remotes") # helper package for installation
::install_github("dkneis/rodeo") # installs "rodeo" from github
remoteslibrary("rodeo") # loads the package
The package provides both high level methods for teaching purposes and low level methods for advanced applications. In this course, we’ll focus on the high level methods.
If you want to get a better understanding of how the rodeo
package can facilitate the implementation of a wide range of simultaneous ODE models, you can look at this article or study the package vignette which contains more examples (PDF file available here). The package has previously been applied, for example, to simulate bacterial plasmid transfer or predator-prey interactions.
3.3 Selected rodeo
functions explained
3.3.1 Building a model
Building a model is equivalent to importing the ODE system in matrix format, together with a declaration of variables, parameters, and functions involved. After validation, the equations are translated into source code.
The rodeo
packages provides two high-level functions for that purpose: buildFromWorkbook
and buildFromText
. The two differ with regard to the input file format only. buildFromWorkbook
is made for reading spreadsheets while buildFromText
imports all information from delimited text files.
::buildFromWorkbook # opens the help page
?rodeo::buildFromText ?rodeo
A successful call to these functions returns a rodeo
object. Unlike a classical variable, this objects contains data (equations, values of parameters, etc.) but also methods (functions to query and modify the data).
The format of the input files for buildFromWorkbook
or buildFromText
is best understood by looking at demonstration examples. One example is built into the rodeo
package itself. It implements a common type of epidemiological model known as SEIR (see more in Section 4.2). You can use the following command to find the corresponding inputs for buildFromWorkbook
or buildFromText
on your local computer.
system.file("models", package="rodeo") # folder with example files
3.3.2 Setting initial values and parameters
When either buildFromWorkbook
or buildFromText
is run with default arguments, parameters and state variables are initialized automatically. However, it is often desired to change those values later without rebuilding the model object.
Since the initial values of state variables and parameter values are an integral part of a rodeo
object, they must be set using the respective class methods setVars
and setPars
. In typical applications, both methods require named numeric vectors as inputs.
::setVars # opens the help page
?rodeo::setPars ?rodeo
3.3.3 Running a dynamic simulation
Dynamic simulations can be run by directly calling a solver like lsoda
from the deSolve
package. However, it is often simpler to call the dedicated class method dynamics
. The latter is essentially a wrapper around lsoda
that provides useful defaults for all arguments and hides more technical details from the normal user.
::dynamics # opens the help page ?rodeo
4 Sample models implemented with rodeo
4.1 Resource-limited growth in a CFSTR
4.1.1 Background
In this example, we consider a continuous bioreactor where heterotrophic E. coli bacteria grow on a rich organic medium known as LB. The system is inoculated with bacteria once and thereafter fed with sterile medium only.
4.1.2 Model specification
Using the notation from Equation 3, the model is fully specified by the Table 1 and Table 2.
Table 1 does not need much explanation. It declares the two state variables and provides a set of default initial values. Very similar for the parameters. All other information is primarily meant as a documentation. A model without documentation is essentially useless!
type | name | unit | description | default |
---|---|---|---|---|
variable | LB | mg / mL | medium concentration | 1.5e+01 |
variable | Ecoli | cells / mL | bacterial density | 1.0e+05 |
parameter | mu_max | 1 / h | max. growth rate | 2.0e+00 |
parameter | h | mg / mL | half saturation const. | 1.8e-01 |
parameter | y | cells / mg | yield of Ecoli per LB | 1.0e+06 |
parameter | q | mL / h | flow rate | 1.0e+01 |
parameter | v | mL | reactor volume | 1.0e+02 |
parameter | LB_in | mg / mL | input medium conc. | 1.5e+01 |
Table 2 implements the ODE right hand sides. Essentially, the column with name rates
holds the vector of process rates (cf. Equation 3). The columns named like the two state variables make up the stoichiometry matrix. A minimum documentation is provided through columns description
and unit
. The name
column is primarily needed to extract the value of the process rates from simulation output for, e.g., the purpose of plotting.
name | unit | description | rate | Ecoli | LB |
---|---|---|---|---|---|
flow | 1 / h | im-/export | q / v | -Ecoli | LB_in - LB |
growth | cells / mL / h | growth | mu_max * LB / (LB + h) * Ecoli | 1 | -1 / y |
4.1.3 Organize the model as a workbook
Create a new workbook using spreadsheet software of your choice.
Add a sheet
declarations
to the workbook and fill in the contents of Table 1 (using copy-and-paste).Add a sheet
equations
to the workbook and fill in the contents of Table 2 (using copy-and-paste).Safe the workbook as
Ecoli.xlsx
in a folder of your choice.
Alternative workflow: Save the contents of Table 1 and Table 2 as two separate text files named, e.g., declarations.txt
and equations.txt
, respectively. In the following code sections, replace any calls to buildFromWorkbook
by buildFromText
and provide the two text files as inputs.
4.1.4 Simulation of system dynamics
Now, let’s run a dynamic simulation with that model. Unless we take special action, the defaults from sheet declarations
(Table 1) will automatically be used as initial values and parameters.
rm(list=ls())
library("rodeo") # load package
<- "models/Ecoli.xlsx" # adjust as necessary
workbook <- buildFromWorkbook(workbook) # assemble model
m
<- seq(0, 12, 0.1)
times <- m$dynamics(times=times) # run simulation
x
plot(x, log="y", las=2) # default plots
Comparing the few lines of code above against previous code blocks hopefully convinces you of the benefit of the matrix-based notation and subsequent automatic code generation.
4.1.5 Comparison of alternative dynamics
Very often, we want to compare predictions of a model for different initial values or parameters. For this purpose, the rodeo
package provides as class method scenarios
. This method can trigger a series of dynamic simulations where, in each run, the values of particular parameters and/or initial values are modified. All values not being set explicitly will remain unchanged. The scenarios
method is demonstrated below. The method attempts to produce graphics for a quick visual comparison of the scenarios (Figure 3). These graphics are not necessarily ideal with regard to scaling, legend position, etc., and you typically want to use the returned data frame for making your own custom plots.
# scenarios must be defined as a named list
<- list(
sc default = c(m$getPars(), m$getVars()), # unchanged
clean_start = c(LB=0, Ecoli=1e3), # altered initial values
high_flow = c(q=10 * m$getPars()[["q"]]) # altered parameter
)<- m$scenarios(times=times, scenarios=sc) x
4.1.6 Steady-state solutions
4.1.6.1 Relevance
The control of biomass production in a continuous flow bio-reactor is best understood by looking at steady-state solutions. Then, by definition, biomass and resource concentration remain constant and so do the reactor’s output loads.
Imagine you are responsible for the operation of the bioreactor. Specifically, you are supposed to maximize steady-state production (i.e. the output load of bacteria) by choosing optimum values for the flow rate and the resource concentration in the input. How can you do that? Instead of going by trial-and-error, you certainly want support from modeling.
4.1.6.2 Solution using long-term integration
Since we have the technology ready to predict the dynamics of bacteria in the bioreactor, we can exploit it for the calculation of steady-state conditions. All we need to do is to run the model with constant external forcings (flow, input concentration) over a very long time interval. While this strategy is not very elegant, it is essentially fail-proof.
# computes the output load in steady-state by long-term simulation
# as a function of the flow rate (q) and the input concentration (LB_in)
<- function(q, LB_in, model) {
out_stdy <- seq(from=0, to=24*365*10, length.out=10) # 10 years !
times <- model$getPars() # get current parameters
pars "q"] <- q # overwrite
pars["LB_in"] <- LB_in # overwrite
pars[$setPars(pars) # make updates effective
model<- model$dynamics(times=times) # simulation
x nrow(x), "Ecoli"] * q # final output load
x[
}
# define scenarios of interest
<- c(0.5, 1, 5, 10, 20, 50, 100, 150, 200, 250)
q <- c(1, 15)
LB_in
# simulate and visualize
par(mfrow=c(1, length(LB_in)))
for (Li in LB_in) { # step through individual values of LB_in
<- sapply(q, out_stdy, # get results for all values of q
loads LB_in=Li, model=m)
plot(q, loads, xlab="q (mL/h)", # visualize output load over q
ylab="Output load (cells/h)")
mtext(side=3, paste0("LB_in: ",Li,
", max. output: ",signif(max(loads),2)))
}par(mfrow=c(1, 1))
In agreement with expectation, biomass production increases as the concentration of the resources in the reactor’s inflow is increased (compare y-axes in the two plots of Figure 4).
Over a certain range of flow rates (lower axes of Figure 4), we observe a similar trend towards higher biomass production. However, when the flow rate is chosen too large, biomass production obviously breaks down. What is the cause of this crash?
Simple answer: As the flow rate increases, the input load of resources also increases which is reflected by the rise in bacterial biomass. However, a higher flow rate also means higher losses as bacteria are flushed from the reactor in shorter time. Beyond a certain flow rate, losses can no longer be compensated by growth as bacteria are already growing at their intrinsic maximum rate. At this point, the system switches from resource-limited to loss-controlled. To study this aspect in more detail, it is convenient to look at a direct analytic solution (see below).
4.1.6.3 Direct analytic calculation of the steady-state
Reverting the information from Table 1 into plain math notation yields the following set of ODE (Equation 4) where \(q/v\) was replaced by the inverse of the residence time, \(1/\tau\), for convenience. In fact, the steady-state solution for this system can be found using simple algebra.
\[ \begin{aligned} \tfrac{d}{dt}Ecoli &= & \mu_{max} \cdot \frac{LB}{LB + h} \cdot Ecoli & - \frac{1}{\tau} \cdot Ecoli \\ \tfrac{d}{dt}LB &= -\frac{1}{y} \cdot & \mu_{max} \cdot \frac{LB}{LB + h} \cdot Ecoli & + \frac{1}{\tau} \cdot (LB_{in} - LB) \end{aligned} \tag{4}\]
After setting the left-hand sides to zero (steady-state), the first ODE can be divided by \(Ecoli\) and then solved for the concentration of the medium (\(LB\)). The resulting expression is (Equation 5):
\[ LB = \frac{h}{\tau \cdot \mu_{max} - 1} \tag{5}\]
Similarly, the second ODE can be rearranged to find the steady-state concentration of \(Ecoli\) (Equation 6).
\[ Ecoli = \dfrac{y \cdot \left( LB_{in} - LB \right)}{\tau \cdot \mu_{max} \cdot \dfrac{LB}{LB + h}} \tag{6}\]
To solve Equation 6, you need to substitute all instances of \(LB\) on the right hand side by the previously found steady-state solution (Equation 5) and you will end up with the somewhat lengthy expression of Equation 7.
\[ Ecoli = \dfrac{y \cdot \left( LB_{in} - \dfrac{h}{\tau \cdot \mu_{max} - 1} \right)}{\tau \cdot \mu_{max} \cdot \left( \dfrac{\dfrac{1}{\tau \cdot \mu_{max} - 1}}{\dfrac{1}{\tau \cdot \mu_{max} - 1} + 1} \right)} \tag{7}\]
Fortunately, the entire puzzling denominator of Equation 7 turns out to vanish (i.e. equals unity) upon further simplification. Consequently, the steady-state solution for \(Ecoli\) is given by Equation 8.
\[ Ecoli = y \cdot \left( LB_{in} - \dfrac{h}{\tau \cdot \mu_{max} - 1} \right) \tag{8}\]
Let’s use Equation 8 to inspect one of the cases from Figure 4 for a continuous, dense array of flow rates. Besides steady-state output loads we also look at the respective concentrations (Figure 5).
<- m$getPars() # get values for parameters as before
pars "LB_in"] <- 15 # set input concentration explicitly
pars[<- pars[names(pars) != "q"] # drop flow rate to avoid confusion
pars <- 1:250 # chose vector of flow rates explicitly
q
<- with(as.list(pars),
concs * (LB_in - h / (v / q * mu_max - 1)) # steady-state concentrations
y
)<- concs * q # corresponding output loads
loads
par(mfrow=c(1,2))
plot(q, concs, type="l", xlab="q (mL/h)", ylab="E. coli (cells/mL)")
plot(q, loads, type="l", xlab="q (mL/h)", ylab="Output load (cells/h)")
par(mfrow=c(1,1))
From Figure 5 we can learn at least two things:
Over a wide range of flow rates, the steady-state concentration is almost constant (left sub-figure). Hence, the increase in the output load (right sub-figure) observed for rising, sub-critical flow rates is almost exclusively due to the flow.
The critical flow rate where biomass production breaks down can now be identified very clearly. Looking at Equation 8, it becomes clear that the crash indicated by the discontinuity in Figure 5 occurs at a flow rate corresponding to \(\tau = 1 / \mu_{max}\) (note the zero division in Equation 8 at this point). Even if resources were available in huge excess, the actual bacterial growth rate would remain restricted at \(\mu_{max}\). Consequently, whenever the loss rate (reflected by \(1/\tau\)) exceeds \(\mu_{max}\), a sustainable bacterial population cannot exist.
4.2 Epidemic model of the SEIR family
4.2.1 Background
The term SEIR model denotes a particular family of epidemiological models. They describe the dynamics of an infectious disease within a population consisting of four sub-groups (Table 3, Figure 6).
Group | Variable symbol | Description |
---|---|---|
Susceptible group | S | Individuals who did not acquire the disease yet and are thus considered susceptible |
Exposed group | E | Individuals who acquired the infection but who are not yet infectious. In those people, the pathogen undergoes incubation. |
Infected group | I | Individuals in an advance state of infection who can transmit the pathogen to others. |
Recovered group | R | Recovered individuals. They are assumed to be (and remain) immune against recurrent infection. |
The basic assumptions of behind SEIR models are described, e.g., in this article but also in many educational texts. Note that there are many variants of epidemiological models in addition to the simple SEIR. They either neglect aspects (e.g. the \(E\) group) or emphasize certain additional features like a gradual loss of immunity in the \(R\) group.
4.2.2 Model specification
Using the notation from Equation 3, the model is fully specified by the Table 4 and Table 5.
type | name | unit | description | default |
---|---|---|---|---|
variable | S | individuals | susceptible | 8e+07 |
variable | E | individuals | carrier, not yet infectious | 8e+05 |
variable | I | individuals | infectious | 0e+00 |
variable | R | individuals | recovered, immune | 0e+00 |
parameter | t | 1/time | transmission rate constant | 1e+00 |
parameter | i | 1/time | incubation rate constant | 2e-01 |
parameter | r | 1/time | recovery rate constant | 4e-01 |
name | unit | description | rate | S | E | I | R |
---|---|---|---|---|---|---|---|
transm | ind./day | transmission | t * I/(S+E+I+R) * S | -1 | 1 | 0 | 0 |
infect | ind./day | infection | i * E | 0 | -1 | 1 | 0 |
recover | ind./day | recovery | r * I | 0 | 0 | -1 | 1 |
4.2.3 Organize the model as a workbook
Create a new workbook using spreadsheet software of your choice.
Add a sheet
declarations
to the workbook and fill in the contents of Table 4 (using copy-and-paste).Add a sheet
equations
to the workbook and fill in the contents of Table 5 (using copy-and-paste).Safe the workbook as
SEIR.xlsx
in a folder of your choice.
Again, you could use the alternative workflow outlined above if you prefer plain text over spreadsheets.
4.2.4 Simulation of system dynamics
The R code below triggers a dynamic simulation of the SEIR model using the default parameters and initial values. Note that visualization (Figure 7) is based on a few lines of custom code to arrange all variables in a single plot.
rm(list=ls())
library("rodeo") # load package
<- "models/SEIR.xlsx" # adjust as necessary
workbook <- buildFromWorkbook(workbook) # assemble model
m
<- m$dynamics(times=0:90) # simulation with defaults
x
<- m$namesVars() # custom plotting
varnames plot(range(x[,"time"]), range(x[,varnames]),
type="n", xlab="Days", ylab="Individuals")
for (vn in varnames) {
lines(x[,"time"], x[,vn], col=match(vn, varnames))
}legend("right", bty="n", lty=1, col=1:length(varnames),
legend=varnames)
As illustrated by Figure 7, the SEIR model as implemented above predicts an end of the epidemic once most of the individuals had been infected and went through recovery. However, we know by experience that immunity acquired in response to an infection does not necessarily last forever.
4.2.5 Excercise
Modify the above system of ODE to account for a gradual loss of immunity in the group of recovered individuals.
Play with the model parameters within reasonable bounds. Does the model run straight into steady state under all circumstances?
4.2.6 A possible solution
The extended SEIR model which accounts for the possibility of recurrent infections is known as SEIRS (note the final ‘S’). Using rodeo, it is fairly simple to implement.
We have to add a process to the model representing a transition of individuals from the pool of recovered (
R
) back to the susceptible (S
). For this, we append an additional row to theequations
sheet.A suitable formulation of the process rate is
s * R
. This is a classical first order term with the newly introduced rate constants
controlling how fast immunity is lost.Since individuals are lost from the
R
pool and join theS
pool, the corresponding stoichiometry factors are-1
and+1
.For completeness, the new parameter
s
must be declared on the sheetdeclarations
.
A sample output generated with the extended model is displayed in Figure 8. Note that I adjusted the values of some of the rate constants to make the epidemic a bit more dynamic and trigger damped oscillations.
<- "models/SEIRS.xlsx" # adjust as necessary
workbook <- buildFromWorkbook(workbook) # assemble model
m
<- m$scenarios(times=0:365, # scenarios illustrating
x scenarios=list(SEIR=c(s=0), SEIRS=c(s=0.01)), # SEIR vs. SEIRS
plot.vars=TRUE, leg="bottomright")