System analysis: Inverse modelling

Author

David Kneis (firstname.lastname @ tu-dresden.de)

Published

February 15, 2025

\(\leftarrow\) BACK TO TEACHING SECTION OF MY HOMEPAGE

1 Motivation

1.1 Forward simulation vs. inverse modeling

Mechanistic models are most commonly used to solve initial value problems. Given an initial state, a system of equations with known parameters is used to estimate a future state (or steady state). This is sometimes denoted “forward simulation”.

The term inverse modeling refers to a different situation in which the system dynamics has been observed (i.e. measured) and elements of the model are unknown and shall be estimated (Figure 1). The nature of the unknowns depends very much on the application:

Most typically, the unknowns are parameters in the narrow sense, i.e. constants appearing in the model equations. Examples include maximum growth rates, half-saturation constants, rates of mortality, etc.
The estimation of unknown external forcings is a typical goal of inverse modeling too. For example, one could reconstruct the inflow rate into a lake based on observed changes in the concentration of a conservative tracer.
Alternatively, one may be interested in reconstructing an earlier system state. In this case, the unknowns are equivalent to the initial values.
A special case arises when the structure of the model itself is to be identified among several alternatives. This is closely connected or even identical to model selection.

Here, the focus will be on basic cases. For simplicity, we will generally use the term parameters to denote the unknowns whose values are to be determined.

Figure 1: Forward simulation (A) in comparison to two scenarios of inverse modeling (B, C) illustrated on a two-dimensional state space. In case A, we use a fully specified model to predict a future state (\(t_1\)) from a known initial state (\(t_0\)). The solid arrow indicates the trajectory. In the most common case of inverse modeling (B), we seek to identify the model that plausibly connects two observed states. Given a specified model and observations at \(t_1\), we might also reconstruct a past state at \(t_0\) (C).

1.2 Inverse modeling means optimization

Inverse modeling is largely synonymous with “model fitting”, “model calibration”, or just “parameter estimation”. Whatever you call it, it poses an optimization problem. The goal is to minimize the mismatch between observations and the corresponding predictions of a model by tweaking the values of unknown parameters. (Figure 2).

Figure 2: Illustration of the fitting problem. Generally, the model predictions (lines) cannot perfectly match all observations (dots). This is either due to noise in the data, an overly simplistic model structure, or both. Consequently, we aim to identify the “best” model rather than the “true” model. For a given mathematical structure, the different models only differ with regard to parameter values.

The fitting problem arises in both mechanistic and statistical modeling. In the context of statistical modeling, parameter fitting is better known as regression analysis. Here, we focus on mechanistic modeling where the unknown parameters have a bio-physico-chemical interpretation and a meaningful unit.

2 Fundamental cases and stragies

The mathematical structure and complexity of differs substantially between models. Figure 3 provides a basic guideline to choose the appropriate approach to inverse modeling depending on the situation.

Figure 3: The choice of an adequate approach to parameter estimation is dictated by the model.

To make the guidance given by Figure 3 more practical, consider the following details.

Is there an analytical solution? The answer to the question is yes, if the model has a close-form solution. Conversely, if the model involves numerical computations (e.g. integration of ODE) or adopts methods of agent-based simulation, the answer is no.

Is the model linear? In other words, the question is whether we are seeking to estimate the slope (\(\beta\)) and intercept (\(\alpha\)) of a linear model \(y_i = \alpha + \beta_1 \cdot x_{i,1} + \beta_2 \cdot x_{i,2} + \ldots\) connecting the response (\(y\)) to predictor(s) denoted \(x\). Linear models are very attractive because unique values for all parameters (\(\alpha\) and \(\beta\)) can be found under all circumstances with no effort.

Can the model be linearized? Some models look non-linear at the first glance but they are in fact intrinsically linear, i.e. they can be turned into a linear model by transformation. Before the computer era, linearization was almost always employed. Nowadays, you should think about it twice. In some cases it may be appropriate, in other cases you don’t want to linearize. The decision should be guided by the structure of the residuals of the fitted model.

The classic examples of intrinsically linear models are exponential and power models which can be transformed like so:

\(y = a \cdot x^b \;\;\; \rightarrow \;\;\; ln(y) = ln(a) + ln(b) \cdot x\)

\(y = a \cdot e^{\left( b \cdot x \right)} \;\;\; \rightarrow \;\;\; ln(y) = ln(a) + b \cdot x\)

As indicated by Figure 3, R provides convenient built-in methods for models having an analytic solution (see the R methods lm and nls). The use of these methods is typically demonstrated in a separate statistics course.

Consequently, in this document, we will focus on a universal approach to model fitting which is applicable to a very broad range of cases. These include, for example, dynamic ODE-based models that require numerical integration and even agent-based models which represent a collection of rules rather than a set of equations.

3 Universal least-squares approach

3.1 The objective function (OF)

The role of the objective function (OF) is to quantify the mismatch between observations and predictions. Crucially, the value of the OF depends on the values of the parameters to be estimated. Good parameter values result in a low value of the OF. A large value of the OF signals bad parameter estimates. Obviously, the goal is to find parameter values that result in the lowest possible value of the OF (Figure 4).

Figure 4: The value of the objective function (OF) is controlled by the value of the parameter(s) to be fitted. For the one-dimensional case (left) and problems with two unknown parameters (right), the dependence can be visualized easily.

The conventional measure of the mismatch between \(n\) observations (\(obs\)) and the corresponding model-simulated values (\(sim\)) is the sum of squared residuals (Equation 1). Taking the square of the differences ensures that (1) cases of over- and underestimation do not cancel out and (2) large mismatches are subject to a higher penalty.

\[ SSR = \sum_{i=1}^{n} \left( sim_i(p) - obs_i \right)^2 \tag{1}\]

For Equation 1 to be a useful objective function, the model-simulated values (\(sim\)) must be a function of the unknown parameters \(p\).

3.2 Initial guess (IG)

Minimization of the OF is an iterative search for better solutions. The starting point of the search is known as the initial guess (IG). This can be a single value in a one-dimensional problem or a vector of values of multiple parameters are to be fitted simultaneously.

For well-behaved problems, the initial guess does not matter much. However, this is not necessarily so. Especially when the OF has local minima, a good initial guess is important to increase the chance of finding the global mimimum (see Section 5.3 for details). Moreover, a good initial guess will save computation time which can be important when dealing with large numerical models.

3.3 Algorithm to minimize the OF

3.3.1 General outline

Finally, an algorithm is necessary to control the iterative search. In general, every algorithm starts with computing the value of the objective function for the initial guess, i.e. OF(IG). Thereafter, the algorithm attempts to modify the initial guess such that the value of the OF is successively lowered (Figure 5). Algorithms stop if a further decrease in the value of the OF cannot be achieved (i.e. further improvement becomes minimal). Most algorithms also have a kind of emergency exit, i.e. they stop searching after a certain (large) number of iterations. An emergency exit often indicates an infeasible optimization task, most likely due to a malformed OF.

Figure 5: Interplay of the algorithm with the initial guess and the objective function.

3.3.2 Deterministic algorithms

The classical algorithms are deterministic. If they are repeatedly confronted with the same problem, they will always output an identical estimate. If the dependence of the OF on the parameter values is rather well-behaved as in Figure 4 and a good initial guess is available, deterministic algorithms are the perfect choice.

If only a single parameter needs to be optimized, Brent’s method is what you need. The corresponding function in R is optimize (same as optimise).

In R, algorithms for multi-dimensional problems (i.e. the simultaneous estimation of more than one parameter) are accessible through the optim function. The default is the Nelder-Mead algorithm (a.k.a Downhill simplex). It is a very robust algorithm as it only compares values of the OF across candidate solutions. It may be a bit slow on well-behaved problems though. A possible alternative are conjugate gradient methods. The latter evaluate gradients, i.e. derivatives of the OF with regard to parameters. Although convergence speed may be outstanding, such methods are often fragile.

A very popular choice is the Levenberg-Marquardt algorithm implemented in the R package minpack.lm. This algorithm allows for so-called box constraints (see details in Section 5.4). It is also accessible through the highly recommended FME package.

3.3.3 Stochastic algorithms

In stochastic optimization algorithms, the proposal of alternative solutions involves random numbers. Consequently, a repeated execution on the same fitting problem may lead to different result. In the best case, the differences are, however, below the threshold of relevance. Examples include, for example, simulated annealing (built into optim) or particle swarm optimization (R package pso).

Stochastic methods are mainly used for minimizing very tricky OF (highly non-linear, presence of local minima, possibly discontinuities) when a good initial guess is not available. Stochastic algorithms can also be combined with deterministic ones. For example, one can first run a stochastic algorithm and then use the best solution found as an initial guess for subsequent deterministic minimization of the OF.

3.3.4 Algorithms for specific problems

Inverse modelling problems in ecology (and beyond) are highly diverse. Consequently, there is no unique algorithm which performs best under all circumstances. I recommend the CRAN Task View: Optimization and Mathematical Programming to guide you through the jungle of available algorithms.

4 Example: Logistic growth

4.1 Basics

4.1.1 Data set

We consider data from a growth experiment with Escherichia coli (Figure 6). The data were produced on a microplate reader which measures the optical density (i.e. extinction) in the individual wells (typically 8 * 12 wells on a plate). We inspect the data of a single well only.

obs <- data.frame(
  hours= seq(from=2, to=23, by=0.5),
  OD600= c(0.008,0.014,0.024,0.042,0.071,0.112,
    0.175,0.265,0.352,0.415,0.439,0.453,0.48,0.538,0.573,
    0.62,0.673,0.709,0.73,0.776,0.78,0.807,0.798,0.794,0.793,
    0.795,0.819,0.8,0.804,0.808,0.809,0.808,0.801,0.815,0.806,
    0.809,0.811,0.812,0.815,0.817,0.818,0.821,0.824)
)
layout(matrix(1:2, ncol=2))
plot(OD600 ~ hours, data=obs)
plot(OD600 ~ hours, data=obs, log="y")
layout(1)

Figure 6: Population growth of E. coli bacteria in Mueller-Hinton medium. Optical density (OD) at a wave length of 600 nm is used as an indicator of biomass. The left plot has a linear y-axis, the right one is log-scaled.

4.1.2 Model and quantities to be estimated

As an exercise, we want to fit the logistic growth model to this data set which accounts for a transition from exponential growth to zero-growth (stationary phase) as the density of the culture (\(N\)) approaches the carrying capacity \(K\) (Equation 2).

\[ \tfrac{d}{dt}N = \mu_{max} \cdot N \cdot \left( 1 - \frac{N}{K} \right) \tag{2}\]

In our case, \(N\) corresponds to the time-resolved measurements of the optical density stored in the column OD600 of the data frame obs in the R code above.

For the particular case of the logistic model, we know that Equation 2 has an analytic solution given by Equation 3. For the sake of computational efficiency, we will adopt the latter for use in this document.

\[ N(t) = \frac{K}{1 + \left(\frac{K}{N_0}-1\right) \cdot e^{(-t \cdot \mu_{max})}} \tag{3}\]

Fitting Equation 3 to the observation data from Figure 6 is equivalent to estimating at least the maximum growth rate \(\mu_{max}\) and the carrying capacity \(K\). These two quantities are actual parameters. However, in this case, it makes sense to also treat the initial value \(N_0\) as an additional unknown whose value shall be estimated along with the actual parameters. This is due to the fact that the plate reader lacks sensitivity at very low culture densities which is why data before hour 2 are omitted (see Figure 6).

An R code implementing Equation 3 is given below. Note that all unknowns to be estimated are passed into this function as a named vector (pars) rather than as individual arguments.

N <- function(pars, time) {
  with(as.list(pars), {
    K / (1 + (K / N_0 - 1) * exp(-time * mu_max))
  })
}

4.1.3 Alternative model implementation

As pointed out, it makes sense to adopt the analytic solution (Equation 3) implemented in the above R code to save computer time. Nevertheless, the following piece of R code is to show how one could implement the model in the absence of an analytic solution using the ODE intergrators from deSolve. Both, the analytic and numerical implementation lead to identical results in the parameter fitting process. Using the numerical model would only be a lot slower.

### FOR COMPLETENESS ONLY - CODE NOT USED SUBSEQUENTLY!

library("deSolve")

# ODE of logistic growth
dN_dt <- function(time, vars, pars) {
  with(as.list(c(vars, pars)), {
    list(mu_max * N * (1 - N/K))
  })
}

# Numerical solution, same interface as for analytic one
# - initial value N(t=0) is passed as parameter N0
# - integration must start at t=0, even if min(time) > 0
N <- function(pars, time) {
  t <- if (min(time) > 0) c(0, time) else time
  x <- deSolve::lsoda(y=c(N=pars[["N_0"]]), times=t,
    func=dN_dt, parms=pars[c("mu_max","K")])
  x[x[,"time"] %in% time, "N"]
}

4.2 Implementing the objective function (OF)

Under all circumstances, the objective function requires as arguments …

the vector of parameter values. By convention, this must be the first argument of the OF. The fitting algorithm will iteratively adjust the parameter values while trying to minimize the function’s return value.
the observation data. The observations are needed to quantitatively compare them against the corresponding model predictions.

The code below implements a suitable objective function for the particular case, which computes the sum of squared residuals (Equation 1). Because the data cover a wide range, a log transformation is reasonable. Passing the transformation function as an additional argument allows the type of transformation to be changed later with minimum effort.

SSR <- function(pars, data, trans=log) {
  obs <- trans(data[,"OD600"])
  sim <- trans(N(pars, data[,"hours"]))
  sum((sim - obs)^2)
}

4.3 Making an initial guess (IG)

In the specific case, a reasonable initial guess can easily be obtained from Figure 6. An estimate of N_0 is obtained by extrapolating OD600 to hour zero. As an estimate of the carrying capacity K, we can use the stationary value of OD600 beyond about hour 12.

A reasonable initial value for mu_max can either be inferred from plain experience or from the slope of the linear part of the log-scaled graph of Figure 6. Remember that the linear part represents the exponential growth phase where the reduced model

\(N(t_2) = N(t_1) \cdot exp(\mu_{max} \cdot (t_2 - t_1))\)

applies. The latter can be rearranged to yield

\(\mu_{max} = \dfrac{ln\left(\dfrac{N(t_2)}{N(t_1)}\right)}{(t_2 - t_1)}\)

which allows the parameter to be estimated from observations at two discrete time points.

With that, a reasonable initial guess is as follows.

guess <- c(mu_max=1, N_0=0.001, K=0.7)

4.4 Testing the OF

Before actually calling an algorithm to minimize the OF, it is highly recommended to test the objective function. Initially, one should simply call the OF with the suggested initial guess. This helps to detect major bugs at an early stage.

print(SSR(pars=guess, data=obs))

[1] 1.35476

Next, I recommend to test the sensitivity of the OF to alterations of the parameters. If the value of the OF does not increase for obviously bad choices of the parameters, the OF is apparently malformed. With moderate effort, once can generate random parameter sets (i.e. various guesses solutions) and inspect the corresponding values of the OF. Ideally, these random parameter sets cover part of the parameter space which is supposed to contain the optimum. That is, you want to sample the vicinity of the initial guess. The approach is illustrated in the subsequent code section producing Figure 7.

set.seed(1)           # make data reproducible
n <- 100              # number of parameter sets

rand.pars <- data.frame(                # make random sets 
  mu_max = runif(n, min=0.5, max=1.5),
  K = runif(n, min=0.7, max=0.9),
  N_0 = 10^runif(n, min=-4, max=-2)     # log-scale here
)

# compute the OF for all parameter sets in a single sweep
rand.of <- apply(rand.pars, 1, SSR, data=obs)

# plot the value of OF over the single parameters
layout(matrix(1:3, ncol=3))
for (p in names(rand.pars)) {
  logax <- if(p == "N_0") "x" else ""
  plot(rand.pars[,p], rand.of, xlab=p, ylab="SSR", log=logax)
}
layout(1)

Figure 7: Dependence of the value of the OF on the values of the individual parameters. The noise in each of the sub-figures is due to the fact that all three parameters have been varied randomly but the x-axis only covers a single parameter. This is similar to looking at marginal distributions.

Figure 7 clearly shows that our OF is sensitive to altered values of the parameters. In fact, it already gives us a first hint on good and bad choices of some of the parameters (e.g. very small values of mu_max or N_0 are apparently linked to high values of SSR).

If you already computed the OF for random parameter sets, it is only logical to pick the set associated with the lowest value of OF and adopt it as your new initial guess. This will likely speed up the optimization. It may also help to find the best (global) solution if the OF is subject to local minima (see Section 5.3 for more details).

guess <- rand.pars[which.min(rand.of),]    # best set found so far
print(guess)

    mu_max        K         N_0
3 1.072853 0.754052 0.001080423

Hint: Random sampling with R’s runif function (as demonstrated above) is simple but not necessarily the best choice. You may consider Latin Hypercube Sampling as implemented in the R package lhs to make sure that the desired part of the multi-dimensional parameter space is sampled about evenly.

4.5 Running the optimization

The default optimization interface for multi-parameter problems like this one is optim. As show in the code section below, we feed optim with three arguments. The two mandatory ones are the initial guess (argument par) and the objective function (argument fn). As a third argument we pass the observation data to the special ... argument of optim. This is to make sure that our particular objective function (SSR) has access to the observations when it is called from within optim.

fit <- optim(par=guess, fn=SSR, data=obs)
print(fit)

$par
     mu_max           K         N_0 
1.019207321 0.765841291 0.001202822 

$value
[1] 0.4904757

$counts
function gradient 
     102       NA 

$convergence
[1] 0

$message
NULL

optim’s output shows that the optimization terminated after 102 iterations and was successful (convergence reported as zero). At the minimum, the value of the OF was about 0.49.

When you run the above code yourself, you will notice a few warnings about NaN values produced. In the particular case, this was due to zeros returned by the model and subsequent log-transformation in SSR. Because the default algorithm employed by optim is very robust, the optimization did still succeed.

To understand the impact of data transformation, we can fit the model again without taking logarithms. To do so, we override the default transformation we biult into SSR by a function which does not transform its argument at all. As we can see, the estimated best-fit parameters change substantially.

fit.notrans <- optim(par=guess, fn=SSR, data=obs,
  trans=function(x) {x})
print(fit.notrans$par)

    mu_max          K        N_0 
0.58757679 0.81221934 0.01251385

Note: Since we adopted the analytic solution of the logistic model (Equation 3) we can actually cross-check our parameter estimats against those produced by nls (recall Figure 3). It turns out that the estimates found by nls are identical to what we got using by the universal optim approach (with log transformation enabled).

# using 'nls' for comparison; log-transformed
fit.nls <- nls(
  log(OD600) ~ log(K / (1 + (K / N_0 - 1) * exp(-hours * mu_max))),
  data=obs, start=guess
)
print(coef(fit.nls))

     mu_max           K         N_0 
1.019226467 0.765844966 0.001202738

4.6 Checking results

4.6.1 Did the algorithm converge?

The convergence component of optim’s return value should always be checked. Only a value of zero indicates that a (local) minimum of the OF was actually found. Any non-zero return value indicates some kind of problem and the estimated parameters, if reported at all, should be considered meaningless.

4.6.2 Visual inspection of the fit

You certainly want to plot the fitted models together with the observed data (Figure 8).

layout(matrix(1:2, ncol=2))
for (logax in c("","y")) {
  plot(OD600 ~ hours, data=obs, log=logax)
  lines(obs[,"hours"], N(fit$par, obs[,"hours"]))
  lines(obs[,"hours"], N(fit.notrans$par, obs[,"hours"]), col="red")
}
legend("bottomright", bty="n", lty=1, col=c("black","red"),
  legend=c("Log-transformed", "Not transformed"))
layout(1)

Figure 8: Like Figure 6 with the fitted model represented by the continuous lines. Obviously, data transformation has a significant impact on the parameter estimates and thus the shape of the best-fit model.

Apparently, neither of the two fitted models matches the observations in a perfect manner. The model fitted on log-transformed data represents the exponential phase very well but the mismatch increases at the culture approaches the carrying capacity (black graph in Figure 8). Without transformation (red graph), the model captures the stationary phase rather well but hardly fits the exponential phase anymore. This is due to the fact that, without transformation, the few low values of OD600 representing the exponential phase make a minor contribution to the sum of squared residuals only.

5 Specific aspects of inverse modeling

5.1 Parameter identifyability and uncertainty

5.1.1 Intro

The fact that minimization of the OF was technically successful does not mean that the obtained parameter estimates are of good enough quality for a specific purpose. Even a perfect match between model and observations does not mean that the parameter values are necessarily reliable because identifyability may be weak and thus uncertainty about the exact value is high. Figure 9 illustrates this in an intuitive manner.

Figure 9: Illustration of identifyability of a single parameter. In case A, the OF increases steeply in the vicinity of the best-fit value, indicating a high degree of certainty. In case B, there is a wide range around the best-fit estimate where the OF is rather insensitive. Our belief in the best-fit estimate is thus limited. On the right hand side (C), the OF is partly flat. Hence, parameter values below the best-fit estimate are equally good (i.e. likely) and the best-fit value is thus clearly uncertain.

5.1.2 Reasons for non-identifyability of parameters

To understand cases like those illustrated in Figure 9, we must distinguish between structural and practical (non)-identifyability. The two terms are well explained in this article.

Structural non-identifyability often arises from a sub-optimal formulation of the model where parameters partly of fully compensate each other. Consider, for example, an exponential growth model extended with a mortality term involving a rate constant \(r\) (Equation 4).

\[ \frac{d}{dt} Y = \mu \cdot Y - r \cdot Y \tag{4}\]

It is obvious from the model’s structure already that \(r\) as well as the growth rate constant \(\mu\) could never be identified from observed values of \(Y\). Upon simple rearrangement, the two parameters collapse into a single one ( \(\mu \cdot Y - r \cdot Y = (\mu-r) \cdot Y = k \cdot Y\)) revealing that a large estimate of \(\mu\) could perfectly be compensated by a low value of \(r\). Cases of structural non-identifyability can often be cured by reformulating the model.

Even if structural non-identifyability does not occur, parameters may still be practically non-identifyable leading to situations as in Figure 9 B or C. Then, non-identifyability reflects a misbalance between the number of unknowns to be estimated and the information content of the observed data (Figure 10). For example, it will be impossible to estimate an optimum temperature of phytoplankton growth from observations made under almost constant temperature conditions.

Figure 10: Ideal balance between model complexity and observation data in parameter estimation. Keywords in italic refer to situations you want to avoid.

The information content of the data is determined by multiple factors. The rule “more is better” applies to each of them:

the number of different variables observed,
the number of independent observations per variable,
temporal or spatial variability in the observations.

5.1.3 Practical approaches

Fit the model to independent data sets: A good and intuitive approach is to fit the model to further independent data sets and to check whether the parameter estimates are comparable to those found earlier. This should always be the primary strategy.

Splitting the available observations into a calibration subset (used for fitting) and a validation subset is another simple and useful means to check the reliability of parameter estimates. The general approach is known as cross-validation with numerous sophisticated methods. Calibrating and testing a model on split samples can also detect overfitting indicated in the very lower right corner of Figure 10. An overfitted model matches the observations almost perfectly, yet the fitted parameter values are likely meaningless.

Estimate confidence intervals: Confidence intervals (CI) connect parameter values to probabilities. For non-linear models, the CIs are nowadays estimated using so-called profile likelihoods further explained here. The basic idea is to inspect the dependence of the OF on the value of a parameter close to its optimum value (see Section 7.1). If the OF is rather a flat line, solutions in the vicinity of the best-fit estimate are equally good (i.e. likely), the best-fit estimate is not reliable (cf. Figure 9).

In R, the nls method allows confidence intervals to be estimated using the profile likelihoods. Since we fitted our example data set using nls as a benchmark, we can do

print(confint(fit.nls))

Waiting for profiling to be done...

               2.5%       97.5%
mu_max 0.9521065294 1.090418434
K      0.7344856930 0.798746157
N_0    0.0009142285 0.001569614

to get estimated confidence intervals at the 95% level. As can be seen in the command’s output, the values are based on likelihood profiling.

So far, I haven’t seen an code to estimate confidence intervals for arbitrary non-linear models fitted with the conventional least-squares approach that is based on likelihood profiling. Hence, you may want to consider the traditional method that infers a 95% confidence intervals from the standard error (\(se\)) of the parameter estimates as \(\pm 1.96 \cdot se\). The required standard errors are output, for example, by the the modFit optimizer from the FME R package. The latter package and the corresponding tutorials are a great resource for those interested in inverse modeling anyway.

Finally, bootstrapping is another popular means to obtain confidence intervals of parameter estimates (or any other statistic). Like cross-validation, bootstrapping works by resampling the available observations (see here for a comparison of the two approaches). In R, bootstrapping methods are implemented in the boot package and boot.ci is the default method to obtain confidence intervals. A big advantage of bootstrap confidence intervals is the fact that they are non-parametric and thus allow for asymmetry (unlike traditional methods based on standard errors).

Bayesian inference: Bayesian inference treats the problem of parameter estimation from a quite different perspective. The idea is to consider the parameters as random variables. Hence, the goals is no longer to identify a single, definite value for each parameter but rather to estimate a distribution. This distribution explicitly accounts for uncertainty. Importantly, the Bayesian concept naturally deals with prior information on parameter values (e.g. obtained in previous experiments) in a sensible way.

To learn more about the approach, just scan the web using a combination of, e.g., the search terms “bayesian” and “ecology”. For related R packages, have a look at the CRAN Task View: Bayesian Inference.

5.2 Critically check the observed data

The wisdom “garbage in, garbage out” perfectly applies to inverse modeling. Hence, before estimating parameters, you should be sure that your observation data make sense. That is, they should be representative and not contaminated with extreme outliers or bias. Noise in general is not a problem but the signal-to-noise ratio must be reasonable.

Trying to fit parameters on a data set containing implausible values is just a waste of time. The optimization problem will become unnecessarily complicated and the obtained estimates, if any, are meaningless. Consequently, my recommendation is: Be critical regarding both the model and the observed data. Never not take observed values as granted! Consider that any sensor system has limitations. To get a feeling of how “wrong” data produced by conventional sensors can be, look at this article on chlorophyll measurements in phytoplankton.

5.3 Presence of local minima

As models and data sets become more complex, there is an increased chance that the objective function has multiple minima (Figure 11). Minimization algorithms generally find local minima only. How can you increase the chance of actually finding the global optimum?

Figure 11: Illustration of local and global minima in a single parameter problem (left) and an objective function depending on two parameters (right).

You could first run an stochastic algorithm or a simple random search (see Section 4.4) and use the best result found as an initial guess for a subsequent deterministic algorithm.

A yet better strategy is to pick multiple random points in parameter space as alternative initial guesses and to start an independent deterministic search from each of these points. If all independent searches converge toward an identical best-fit solution, there is a reasonable chance that the global optimum was found. Otherwise, if different initial guesses result in different solutions, you know at least that local minima are present and the specific inverse problem requires exceptional care.

5.4 Infeasible solutions

In many models, certain parameter values (or combinations) are infeasible. This happens, for example, when a parameter appears in the denominator of an expression (must not be zero) or the parameter controls the argument of functions like sqrt (must not be negative) or ln (must not be zero). In the logistic growth model (Equation 3), for example, calling the objective function for a value of \(N_0 = 0\) obviously leads to problematic results.

As long as you use a robust algorithm like Nelder-Mead, you might succeed. In other case, you may want to prevent the algorithm from trying such infeasible solutions. You can do so by specifying allowed ranges (a.k.a. box-constraints) for each parameter. Algorithms capable of handling such constraints are, e.g., the “L-BFGS-B” method built into optim or the nls.lm optimizer from the package minpack.lm.

Note: Never use box-constraints to force a parameter estimate to fall into a desired range! In other words: If you find that one of the best-fit parameter estimates is identical to the lower or upper limit of the allowed search range, you cannot trust the solution.

5.5 Design the OF carefully

If the observation data are good, the complexity of the model seems adequate, but parameter estimation still fails, this can often be cured by designing a better objective function. Some general hints follow:

Try different data transformations. This is particularly important if variable values range over several orders of magnitude. This is common especially for microbiological data.

Carefully aggregate the mismatch over multiple variables. You often want to fit observations of multiple variables at the same time. A lake model, for example, might compute dissolve phosphorus, pH, and algae biomass. If you have corresponding observations, you want your objective function to quantify the mismatch with regard to all of these variables. In such a case, you want …

to weight the individual variables such that their ranges become similar. This avoids variables with naturally larger values (e.g. electric conductivity in \(\mu\)S/cm) to dominate over ones having smaller values (e.g. phosphorus expressed in mg/L).
to scale the residuals computed for each variable by the number of observations. Without scaling, variables measured more frequently will make a larger contribution to the objective function that those with scarce measurements.

Look at section 2.3 of this article for details about weighting and scaling.

Patterns vs. classical residuals.

A conventional objective function rooted in the idea or minimizing residuals (or maximizing likelihoods) is not always an adequate choice. Sometimes, the most plausible parameter values are those that resemble an observed pattern, even though the residuals are not at a minimum (Figure 12). Situations like this require a custom OF that operates on features rather than ordinary numerical data.

Figure 12: Schematic of a model simulating time-resolved data. The left hand side model predicts the observed peak just a but to early but SSR signals a very bad performance. The solution on the right hand side has a lower SSR and thus appears superior according to conventional assessment. However, the right hand side model does not reflect the observed system dynamics at all. Is it really superior?

5.6 Don’t expect it to be easy

Parameter estimation is a surprisingly tricky thing. This is especially true if the model prediction (and thus the value of the OF) exhibits a strongly non-linear dependence on the parameter values to be estimated. Simply speaking, non-linearity makes it more difficult for the algorithm to chose a good next candidate solution. It is similar to stepping down stairs with irregular curves and unequal step heights while having your eyes closed.

Ecological models are often non-linear. If local minimal, discontinuities, and long computation times come on top, one can easily run into desperate situations. Advice from somebody experienced may be very helpful.

6 Interpreting bad fits

6.1 Just bad? Or actually useful?

Our goal is, of course, to achieve a model fit where simulated and and observed data match almost perfectly. A very good model fit means that the mechanisms upon which the model is build represent a plausible hypothesis about the functioning of the real system. Keep in mind that plausible is not the same as correct.

Conversely, if a severe mismatch between simulated data and corresponding observations remains, we know for sure that our mechanistic understanding (represented by the model) lacks import features of the real system. This is useful knowledge! In fact, comparing model solutions to observations is the only way to make gaps in system understanding explicit. Such gaps are always the seed of scientific progress.

Even better, models can actually help to close these gaps: In a model, you can easily test alternative hypothesis and see whether they provide a better representation of reality. Only thereafter you would invest time and money in experiments or measurements to verify a new hypothesis.

6.2 A better hypothesis for the growth data set

For the studied bacterial growth data, we found that the logistic model is apparently insufficient to explain all details of the observations (Figure 8). In fact, a closer look at the left hand side plot of Figure 8 suggests that we are likely facing a phenomenon known as diauxic growth. That is, we might deal with a succession of distinct growth phases reflecting a sequential consumption of resource components. The latter differ with respect to degradability and thus allow for different growth rates. As a switch between resources requires an adaptation of the bacterial metabolism, the distinct growth phases are typically separated by a lag phase.

The R code below implements a simple ODE model of diauxic bacterial growth assuming a mixture of two resources A and B. The bacteria are assumed to preferably consume A. They only start using B when A becomes depleted. The simplest way of implementing this is through a term that inhibits the growth on resource B as long as A is still available in substantial amounts.

Although the model is fairly simple, it comes with at least 7 parameters! To make parameter estimation feasible at all, we take the approximate initial value of OD600 as given. Similar, we assume a fixed initial value of 1 for the two resources A and B.

To further increase the of success, we employ the Levenberg-Marquardt algorithm from the minpack.lm package. A peculiarity of this algorithm is that fact that it expects the objective function to return the full vector of residuals (and not just a single number like SSR). The implementation below takes this into account.

library("deSolve")
library("minpack.lm")

# System of simultaneous ODE to simulate diauxic growth
derivs <- function(time, vars, pars) {
  with(as.list(c(vars, pars)), {
    growthA <- muA * N * A / (A + hA)          
    growthB <- muB * N * B / (B + hB) * (iA / (A + iA))
    ddt <- c(
      N = growthA + growthB,
      A = -1/yA * growthA,
      B = -1/yB * growthB
    )
    list(ddt[names(vars)],
      c(`growth on A`=growthA, `growth on B`=growthB)
    )
  })
}

# Numerical solution of the ODE for known initial values (ini)
dynam <- function(pars, ini, time) {
  t <- if (min(time) > 0) c(0, time) else time
  x <- deSolve::lsoda(y=ini, times=t, func=derivs, parms=pars)
  x[x[,"time"] %in% time,]
}

# Function returning the residuals
resi <- function(pars, ini, data, trans=log) {
  obs <- trans(data[,"OD600"])
  sim <- trans(dynam(pars, ini, data[,"hours"])[,"N"])
  sim - obs
}

# Initial values taken as given
ini <- c(N=0.001, A=1, B=1)

# Initial guess of parameters with permissible limits
guess <- c(muA=1, muB=0.5, hA=0.1, hB=0.1, iA=0.1, yA=0.5, yB=0.5)
lower <- guess * 0.01
upper <- guess * 10

# Optimization by L.-M. algorithm
fit <- nls.lm(par=guess, lower=lower, upper=upper, fn=resi,
  ini=ini, data=obs, control=list(maxiter=100))

# Visualization of results
sim.ini <- dynam(pars=guess, ini=ini, time=seq(0, 24, 0.1))
sim.opt <- dynam(pars=fit$par, ini=ini, time=seq(0, 24, 0.1))

plot(OD600 ~ hours, data=obs, log="y")          # plot with data
lines(N ~ time, data=as.data.frame(sim.ini), lty=2)
lines(N ~ time, data=as.data.frame(sim.opt))
legend("right", bty="n", lty=2:1, title="Predictions:",
  legend=c("Using the initial guess","Using fitted parameters"))
legend("bottomright", bty="n", title="Diagnostics:",
  legend=c(
    paste("Algorithm converged?",
      if (fit$info %in% 1:3) "Yes" else "No"),
    paste("Parameters in range?",
      if (all(fit$par > lower & fit$par < upper)) "Yes" else "No")
  )
)

Figure 13: A simple model of diauxic growth fitted to the observed data from Figure 6.

7 Appendix

7.1 Inspection of the OF close to the best-fit solution

Analyzing the objective function in the vicinity of the best-fit parameter estimate (Figure 9) provides valuable information on uncertainty. It is also the key concept of estimating confidence intervals by likelihood profiling.

The code section below demonstrates how the profile of the OF close to the optimum solution can be inspected. While this is trivial for the one-dimensional case (Figure 9), it requires a bit more effort when multiple parameters are to be estimated simultaneously. A possible solution is as follows:

Perform a conventional fit. This gives you best-fit-estimates for all \(n\) parameters.
Pick a particular parameter \(i\) and create a vector of \(m\) perturbed estimates to cover the vicinity of the actual best-fit estimate. For example, one could multiply the best-fit value by 0.9, 0.95, 1, 1.05, 1.1 to induce relative perturbations at the 5 and 10% level, respectively.
For each of the \(m\) perturbed values: Hold the \(i\)-th parameter constant at the respective perturbed value, re-fit the remaining \(n-1\) parameters of the model, and record the value of the OF.
Plot the values of the re-fitted OF over the \(m\) perturbed parameter values.
Continue with parameter \(i+1\), i.e. go to step 2.

The below R code implements this algorithm for the logistic growth example. The main difference compared to the original implementation is the distinction between two kinds of parameters, the ones that need to be fitted (argument pars) and the single one which is kept constant at a value close to the previously found optimum (argument fixpar). The output of the code (Figure 14) suggests a clear minimum of the OF for all three parameters, indicating identifyability.

# reimplemented model and objective function
N_fix <- function(pars, fixpar, time) {
  with(as.list(c(pars, fixpar)), {
    K / (1 + (K / N_0 - 1) * exp(-time * mu_max))
  })
}
SSR_fix <- function(pars, fixpar, data, trans=log) {
  obs <- trans(data[,"OD600"])
  sim <- trans(N_fix(pars, fixpar, data[,"hours"]))
  sum((sim - obs)^2)
}

options(warn=-1)    # suppress warnings for now

# fit all parameters simultaneously (best-fit estimates) 
guess <- c(mu_max=1, N_0=0.001, K=0.7)
opt <- optim(par=guess, fn=SSR_fix, fixpar=c(), data=obs)
stopifnot(opt$convergence == 0)
opt <- opt$par

# multipliers to induce perturbations
mult <- c(0.9, 0.95, 0.99, 1, 1.01, 1.05, 1.1)

# refit all parameters but the one parameter kept
# constant at perturbed values
x <- NULL
for (p in names(opt)) {
  for (m in mult) {
    fit <- optim(par=opt[names(opt) != p], fn=SSR_fix,
      fixpar=opt[p]*m, data=obs)
    if (fit$convergence == 0) {
      for (n in names(fit$par)) {
        x <- rbind(x, data.frame(fixpar.name=p, mult=m,
          fixpar.value=opt[p]*m, fitpar.name=n,
          fitpar.value=fit$par[n], objfun=fit$value))
      }
    }
  }
}

options(warn=0)    # re-enable warnings

# plot the OF over the disturbed parameter values
yrng <- range(x[,"objfun"])
par(mfcol=c(1,length(opt)))
for (fixpar in unique(x$fixpar.name)) {
  plot(objfun ~ fixpar.value, data=x[(x$fixpar.name == fixpar),],
    type="b", ylim=yrng, xlab=fixpar, ylab="OF")
  abline(v=opt[fixpar], lty=2, col="red")
}
par(mfcol=c(1,1))

Figure 14: Profiles of the objective function in the vicinity of the best-fit estimates. The latter are indicated by the vertical dashed lines.