Skip to main content

Box 2 An example of integrated modelling in migratory connectivity studies

From: A synthesis of recent tools and perspectives in migratory connectivity studies

The model of Korner-Nievergelt et al. [57] has served as a basis for multiple integrated analyses using Bayesian statistics in the past few years as it combines two classic types of data informing about migratory movements in a flexible framework. It therefore makes a perfect example to illustrate how various data can be integrated in migratory connectivity studies. This model combines reencounter data and tracking data of individuals banded/tagged in breeding areas \(g \in \left[ {1;G} \right]\) and resighted/tracked in non-breeding destination \(d \in \left[ {1;D} \right]\) to estimate the probabilities of movement—or transition probabilities—between breeding and non-breeding sites. Here, these transition probabilities are by definition the migratory connectivity parameters of interest.

1. Writing the likelihoods

The first step in Bayesian statistics is to formulate the likelihood, which represents the probability of having the observed data knowing a set of parameters. It boils down to expressing the data as a function of chosen parameters using statistical models:

  Submodel #1: The probability of reencountering a banded individual from breeding area \(g\) in destination \(d\), \(P_{g,d}^{reenc}\), can be expressed as the conjunction of two events:

      “The individual moved from area \(g\) to destination \(d\) (transition probability \(m_{g,d}\)) AND could be observed in destination \(d\) (reencounter probability \(r_{d}\))”.

  Probabilistically, this can be translated into: \(P_{g,d}^{reenc} = m_{g,d} \times r_{d}\), where we assume that the reencounter probability in destination \(d\) is independent from the origin \(g\) of the individual. The probability of not reencountering a banded individual in any destination is then: \(P_{g,D + 1}^{reenc} = 1 - \mathop \sum \limits_{d = 1}^{D} \left( {{\text{m}}_{g,d} \times {\text{r}}_{d} } \right)\) so that probabilities sum to 1. The total number of banded individuals in breeding area \(g\), \(N_{g}^{reenc}\), can thus be related to the number of banded individuals from breeding area \(g\) that were reencountered, \(R_{g,d}\), (or not reencountered at all, \(Q_{g}\)) in each of the destination areas \(d\) via a multinomial model:

\(\left( {R_{g,1:D} ,Q_{g} } \right) \sim Multinom\left( {P_{{g,1:\left( {D + 1} \right)}}^{reenc} ,N_{g}^{reenc} } \right)\).

  The final likelihood of the live-reencounter submodel is the product of these multinomial models for \(g \in \left[ {1;G} \right]\).

  Submodel #2: Tracking devices give direct information about which destination \(d\) an individual tagged in area \(g\) moved to—if the recovery bias can be ignored for archival tags such as geolocators. In this case, the probability of tracking an individual from area \(g\) to destination \(d\) can be simply expressed as:

      “The individual moved from area \(g\) to destination \(d\) (transition probability \(m_{g,d}\))”.

  Probabilistically, this can be translated into: \(P_{g,d}^{track} = m_{g,d}\). Similar to the live-reencounter submodel, the total number of tracked individuals in breeding area \(g\) for which the data could be retrieved, \(N_{g}^{track} = \mathop \sum \limits_{d = 1}^{D} U_{g,d}\), can thus be related to the number of tracked individuals from breeding area \(g\) that moved to each of the destination areas \(d\), \(U_{g,d}\), via a multinomial model:

\(U_{g,1:D} \sim Multinom\left( {P_{g,1:D}^{track} ,N_{g}^{track} } \right)\).

  The final likelihood of the tracking submodel is the product of these multinomial models for \(g \in \left[ {1;G} \right]\).

Since these two submodels share the same connectivity parameter \(m_{g,d}\), they can be integrated by formulating a joined likelihood. If the two datasets are independent, the joined likelihood is equal to the multiplication of the likelihoods of all submodels.

2. Specifying the prior distributions

Bayesian models use the likelihood to update prior distributions in a Markov chain, which produces a posterior distribution of values for each parameter. Obtaining a distribution of values for the parameters, instead of a single value, is characteristic of the Bayesian approach. The second step to run a Bayesian model is thus to specify prior distributions for the parameters to estimate.

In their model, Korner-Nievergelt et al. [57] chose to use non-informative flat priors: all parameters were assumed to be uniformly distributed between 0 and 1 (following a \(Beta\left( {1,1} \right)\) model for \(r_{d}\) and the multivariate equivalent \(Dirichlet\left( {1, \ldots ,1} \right)\) for \(m_{g,1:D}\)). This means that the transition probabilities and the reencounter probabilities were allowed to converge towards any value between 0 and 1.

However, prior distributions can be restricted to certain values only and thus act as a smooth constraint on the posterior distribution of the parameters. In other models, this has been a second entry door for data combinations, which has for instance been used to refine spatial assignments with abundance data or migratory directions inferred from banding data (e.g. [49, 94]).

Following the same reasoning, new sub-models have been added to this structure to integrate isotope data, parasite data, or even take into account banding data with unknown numbers of banded birds or recovery biases for geolocators [12, 39, 40]. This flexibility makes the strength of Bayesian frameworks for data integration.