System analysis: Inverse modeling

David Kneis (firstname.lastname @ tu-dresden.de)

Intro

  • So far, we used models to compute future states given assumptions about parameters, initial, states, and boundary conditions. This is known as “forward modeling”.

  • Models are also needed to solve “inverse problems”.

  • Inverse modeling is about the estimation of system parameters, forcings, or initial conditions from observed system states.

Bacterial growth as an example

rm(list=ls())
library("deSolve")

dydt <- function(time, vars, pars) {
  with(as.list(c(vars, pars)),
    list(mu * (1 - N/K) * N)                      # logistic growth
  )
}

dynam <- lsoda(y=c(N=1e4), parms=c(mu=2, K=1e9),  # integration
  func=dydt, times=seq(0, 15, .1))

par(mfrow=c(1,2))                                 # plot of dynamics
plot(dynam[,"time"], dynam[,"N"], type="l",
  xlab="Time", ylab="Cells / mL (linear)")
plot(dynam[,"time"], dynam[,"N"], type="l", log="y",
  xlab="Time", ylab="Cells / mL (log scale)")
par(mfrow=c(1,1))

Bacterial growth as an example

Simple case

  • As long as \(N << K\), the culture grows exponentially

  • Leads to simplified model:

\(\frac{d}{dt} N = \mu \cdot N\)

  • Remember the solution?

\(N(t_1) = N(t_0) \cdot e^{(\mu \cdot (t_1 - t_0))}\)

  • We can directly solve for \(\mu\) (or doubling time)

Recast as linear regression problem

Advanced case

  • We estimate multiple parameters simultaneously.

  • Observations cover multiple state variables.

  • Observations are contaminated with measurement noise.

  • The model is not an analytical expression but involves numerical integration of simultaneous ODE.

Context (1)

Bacterial competition in batch culture with antibiotic

Steps: Inoculation of liquid culture, incubation, plating of diluted sample, incubation of plate, counting of colonies.

Context (2)

\[ \tfrac{d}{dt} S = \mu \cdot max\left(0, 1 - \frac{conc}{mic} \right) \cdot \left(1 - \frac{S+R}{K} \right) \cdot S \\ \tfrac{d}{dt} R = \mu \cdot max\left(0, 1 - cost \right) \cdot \left( 1 - \frac{S+R}{K} \right) \cdot R \]

S Susceptible strain
R Resistant strain
\(\mu\) Growth rate constant of S in the absence of antibiotic
K Carrying capacity (substrate limitation)

Context (3)

\[ \tfrac{d}{dt} S = \mu \cdot max\left(0, 1 - \frac{conc}{mic} \right) \cdot \left(1 - \frac{S+R}{K} \right) \cdot S \\ \tfrac{d}{dt} R = \mu \cdot max\left(0, 1 - cost \right) \cdot \left( 1 - \frac{S+R}{K} \right) \cdot R \]

cost Fitness burden of the resistance gene / element
conc Concentration of the antibiotic
mic Minimum drug concentration to inhibit the S strain

Goal & particular strategy

  • Goal: Given observations of S and R densities over time, we like to estimate all the parameters like, e.g., \(\mu\), cost, …

  • Here, we do not use actual lab-borne data but rather generate virtual observations using the model.

    • Makes it easier to integrate data into this presentation.

    • No need to care about ugly details of real observations.

  • We perform virtual growth experiments over a gradient of drug exposure (a single replicate per concentration).

Generating virtual observations

rm(list=ls())
library("deSolve")

model <- function(time, vars, pars) {            # model equations
  ddt <- with(as.list(c(vars, pars)), { c(
    S = mu * max(0, 1 - conc/mic) * (1 - (S+R)/K) * S,
    R = mu * max(0, 1 - cost) * (1 - (S+R)/K) * R
  )})
  list(ddt[names(vars)])
}
pars <- c(mu=1, cost=0.1, mic=2, K=1e9)         # fixed parameters
vars <- c(S=1e5, R=1e5)                         # fixed initial values

x <- NULL       # simulations for a range of antibiotic concentrations
for (cc in seq(from=0, to=0.9*pars["mic"], length.out=7)) {
  d <- lsoda(y=vars, times=0:36, func=model, parms=c(pars, conc=cc))
  x <- rbind(x, cbind(conc=cc, d))
}

Perfect virtual observations

omar <- par("mar")
par(mfcol=c(1,2), mar=c(4,5,1.2,1))
for (v in names(vars)) {
  plot(x[,"time"], x[,v], log="y", xlab="Hours", ylab=v,
    col=match(x[,"conc"], unique(x[,"conc"])))
}
legend("bottomright", bty="n", pch=1, col=1:length(unique(x[,"conc"])),
  legend=signif(unique(x[,"conc"]),3), title="Drug conc.")
par(mfcol=c(1,1), mar=omar)

Let’s add a bit of noise

set.seed(99)
for (v in names(vars)) {
  x[,v] <- exp(log(x[,v]) * rnorm(nrow(x), mean=1, sd=0.05))
}

Noisy virtual observations

omar <- par("mar")
par(mfcol=c(1,2), mar=c(4,5,1.2,1))
for (v in names(vars)) {
  plot(x[,"time"], x[,v], log="y", xlab="Hours", ylab=v,
    col=match(x[,"conc"], unique(x[,"conc"])))
}
legend("bottomright", bty="n", pch=1, col=1:length(unique(x[,"conc"])),
  ncol=2, legend=signif(unique(x[,"conc"]),3), title="Drug conc.")
par(mfcol=c(1,1), mar=omar)

Noisy virtual observations

print(head(x))
     conc time          S          R
[1,]    0    0   113107.3  118518.53
[2,]    0    1   366840.0   68805.92
[3,]    0    2   783138.1  593067.07
[4,]    0    3  2762070.8 3438968.56
[5,]    0    4  4083191.3 2001467.90
[6,]    0    5 16025947.3 4205718.38
print(tail(x))
       conc time        S          R
[254,]  1.8   31 350637.2  609124599
[255,]  1.8   32 567833.6  237892207
[256,]  1.8   33 464555.2  357502801
[257,]  1.8   34 179082.8 2789321434
[258,]  1.8   35 565696.8 2693030104
[259,]  1.8   36 243759.6  471610036

Recall the goal: We want to reconstruct the parameters used to generate these virtual observations

The objective function (OF)

  • The OF computes the mismatch between given observations and the corresponding data simulated with the model for a specific set of parameters.

  • The OF is the key to any optimization problem: Lower values of the OF indicate more likely parameter estimates.

  • Our OF returns a vector of residuals. This is what our specific optimizer from package minpack.lm needs.

  • For standard solvers, the OF must return a scalar, e.g. the sum of the squared residuals (SSR).

The objective function

residuals <- function(p, obs) {   # parameters passed as "p"

  # drug levels and initial values for the experiments
  cases <- obs[obs[,"time"]==0,]

  # simulation of all experiments
  x <- NULL
  for (i in 1:nrow(cases)) {
    d <- lsoda(y=cases[i,c("S","R")], times=0:36, func=model,
      parms=c(p, conc=as.numeric(cases[i,"conc"])))
    x <- rbind(x, cbind(conc=cases[i,"conc"], d)) 
  }
  
  # vector of residuals
  x <- merge(obs, x, by=c("conc","time"), suffixes=c(".obs",".sim"))
  residuals <- c(log(x[,"S.sim"]) - log(x[,"S.obs"]),
    log(x[,"R.sim"]) - log(x[,"R.obs"]))
}

Note the log-scaling to account for the huge spread

Solving the optimization problem

  • For problems like this, the Marquard-Levenberg algorithm does a decent job. Standard solvers may struggle.

  • We start with an initial guess for all parameters.

  • Constraints prevent the solver from trying infeasible parameter values (e.g. where the model could crash).

library("minpack.lm")                              # provides nls.lm

library("minpack.lm")
guess <- c(mu=0.5,  cost=0.01, mic=0.5,   K=1e8)   # intial guess
lower <- c(mu=0.01, cost=0.0,  mic=0.001, K=1e7)   # lower constraint
upper <- c(mu=2.3,  cost=0.99, mic=256,   K=1e10)  # upper constraint
fit <- nls.lm(par=guess, lower=lower, upper=upper,
  fn=residuals, obs=x)

Inspect the solution (1)

print(fit$message)                                # check convergence
[1] "Relative error in the sum of squares is at most `ftol'."
if (! fit$info %in% 1:3) {
  stop("parameter estimation failed: do not trust results")
}
plot(x=0:fit$niter, y=fit$rsstrace,               # plot trace of SSR
  type="b", las=2, xlab="Iteration", ylab="SSR")

Inspect the solution (2)

  • In real-world problems, it is challenging to verify the plausibility of estimates (e.g. compare with other studies).

  • Since we used virtual observations here, we can compare the estimates with known values.

result <- data.frame(
  truth = pars,                           # known truth
  estimates = fit$par[names(pars)]        # estimated from virtual obs.
)
print(cbind(result, percentageError=      # summary table
  round((result$estimates - result$truth) / result$truth * 100, 2)
))
     truth    estimates percentageError
mu   1e+00 9.703008e-01           -2.97
cost 1e-01 7.481259e-02          -25.19
mic  2e+00 2.114077e+00            5.70
K    1e+09 9.106701e+08           -8.93

Challenges in inverse modeling (1)

  • Algorithms tend to get stuck in local minima. You may need to try different initial guesses.

Challenges in inverse modeling (2)

  • Non-identifyability may also reflect insufficient data. Variability is more important than the number of records.

  • The formulation of the OF may be tricky: State variables often differ w.r.t. units and the number of observations.

  • Computing times may become an issue.

Recommendations (1)

  • Start with random parameter sets picked with Latin Hypercube sampling.

    • Is the OF actually sensitive to altered parameters?

    • Try all sets as initial guesses. Are there local minima?

Recommendations (2)

  • Try different formulations of the OF, e.g. by transforming variables.

  • Reduce dimensions, e.g. estimate some parameters from other data.

Recommendations (3)

  • Test feasibility on virtual observations first (see example).

  • Use virtual experiments to identify the kind and amount of observations actually needed. Design experiments / sampling campaigns accordingly.