Understanding animal migration – the cyclical movements of individuals between areas used across the annual cycle – is challenging, yet is often a prerequisite for effective conservation of mobile species. Our capacity to measure migratory movements has improved greatly in recent years through direct methods such as mark-recapture [1] and remote-tracking technology [2], as well as indirect methods such as genetic [3] and biogeochemical approaches [4]. With an improved understanding of individual migratory movements, researchers are increasingly focussing on quantifying resultant population-level spatial patterns. Understanding migratory connectivity (henceforth referred to as ‘connectivity’), which describes the extent to which spatial distributions of individuals are maintained between two phases of the migratory cycle (most often between breeding and non-breeding grounds), has become a top priority [5]. High levels of connectivity indicate that individuals residing close together in a particular season of the annual cycle are also close together in a subsequent season, whilst low connectivity indicates cross-seasonal mixing of individuals from different areas. The strength of connectivity can have important conservation implications, including playing a key role in a migratory species’ propensity to adapt to a changing environment [6, 7].
Multiple statistical approaches to estimate migratory connectivity have been utilised in recent years [5, 8, 9]. To quantify the strength of connectivity (i.e. giving connectivity a numerical value) one of the most commonly used approaches is the Mantel test [10], which evaluates the correlation between two distance matrices: the pairwise distances between locations of sampled individuals in one season, and their equivalent pairwise distances in another [11]. Numerous studies have examined sources of bias in connectivity estimates derived using Mantel correlations, including issues of imbalanced sampling with respect to local abundance, incomplete spatial coverage, and location uncertainty [11, 12]. An extension to the Mantel approach [12] utilises the transition rates of individuals between pre-defined breeding and non-breeding zones to control for these biases, but this method is only recommended in situations where spatial subpopulation structure is well understood, and relative abundances within origin regions can be estimated. Cohen et al. [12] recommend using Mantel correlations when these conditions are not met, and the Mantel approach remains widely used in recent literature (e.g. [13,14,15,16], but see [9, 17]).
One issue that has received little attention in the migratory connectivity literature is the extent to which Mantel correlations can be used to draw inferences about connectivity patterns at the population scale, given that these correlations show scale-dependence [18, 19]. A key aim of migratory connectivity research is to understand migratory patterns at large spatial scales (e.g. whole species ranges), requiring an implicit assumption that metrics quantified for sampled individuals accurately describe behaviour of wider populations. However, these broad-scale inferences have the potential to be strongly biased in some cases, as a product of fundamental sensitivity of connectivity metrics to the spatial extent and configuration of sampling. Estimates also frequently suffer from low precision due to sample size constraints, as the number of individuals tracked within a population is often limited by available funding resources, difficulty in retrieving tracking devices, fieldwork limitations related to catching individuals, site fidelity, and recapture rates [12]. These limitations reduce the proportion of the population that is actually studied, and mean relatively small sample sizes are commonplace in remote-tracking studies [20]. Whilst lower precision can be partially accounted for through bootstrapped confidence intervals, the extent to which precision varies with sample has not been explored in detail [12].
Here, we use simulations to elucidate the direct mechanisms underpinning bias and imprecision in migratory connectivity estimates that use Mantel statistics. We examine the efficacy of multiple sampling scenarios across a range of connectivity levels, considering both homogenous and spatially-clumped populations. We test how the number of individuals sampled impacts the precision of measurements, and examine how the magnitude of bias depends on the extent to which estimates from sampled individuals are used to draw inferences about the wider populations from which they are drawn. Alongside simple generalised simulations that allow us to explore underlying mechanisms of bias, we also use more realistic simulated migratory populations to provide recommendations on study design that can maximise the accuracy of Mantel-based connectivity measures, within realistic limits of sampling.
Mechanisms of bias
To illustrate the fundamental issue arising from spatial sampling bias, we first consider two hypothetical sampling scenarios for a contiguous breeding population with low connectivity (Fig. 1): one where individuals are marked randomly within a single study region of varying size (Fig. 1a-h) and another where individuals are marked within discrete sampling sites that are spread across the range (Fig. 1i-p). In both cases, the plausible range of observable distances between marked individuals is constrained by sampling extent in the season that marking takes place, which is the breeding range in our hypothetical scenario (see Fig. 1). Importantly, however, the maximum measurable distance between these sampled individuals in the non-breeding range is not constrained by sampling design, only by the destinations of the animals themselves. This could introduce a skew in the sample of pairwise distances on the sampled range (breeding grounds in this case), but not on the destination range (non-breeding grounds). As Mantel correlations explicitly compare these pairwise distance distributions between seasons, resulting Mantel statistics calculated for spatially-constrained samples may be very different from the ‘true’ values calculated for the whole population, despite the underlying migratory ecology being constant across the population (as in Fig. 1).
Biases resulting from spatially-constrained sampling regimes could take various forms, depending on how sampling effort is distributed across the species’ range. If sampling is limited to a subset of the breeding distribution (e.g. Fig. 1c), the observed distribution of breeding pairwise distances will be left-skewed relative to the true distribution across the population (Fig. 1d), leading to negative bias in Mantel correlations with respect to true statistic for the wider population. If sampling occurs in discrete areas that are widely separated across the range of a species, however, resulting pairwise distances may be right-skewed relative to the population as a whole (e.g. Fig. 1p) because site spacing introduces abrupt artificial gaps into what may be a more uniform underlying distribution of individuals across space. Given inevitable logistical constraints, migration studies do indeed typically focus on marking individuals at discrete sites within spatially-constrained study areas [18, 21,22,23], with considerable variation in the extent to which these are spread across full ranges. This suggests there may be constraints on the extent to which such studies can draw inferences about the connectivity of wider populations using Mantel statistics calculated from spatially-constrained samples. In the next sections, we use simulations to estimate the severity of these biases under a range of common sampling scenarios.
Simulation methods
To examine fundamental sources of bias, we first simulated simple migratory populations that vary in their degree of migratory connectivity (Fig. A1), and applied a range of sampling regimes to examine how Mantel statistics resulting from realistic simulated ‘studies’ compared to ‘true’ values calculated for the simulated population as a whole.
Simulating breeding and non-breeding locations
First, we created a breeding range filled with N individuals, placed at random by sampling x and y coordinates from a bounded uniform distribution ([24], Fig. A1A, N = 10,000), ensuring that variation in our results reflected sampling effects alone, rather than stochasticity arising from heterogeneous spacing of individuals. We simulated migratory movements by 1) shifting each individual a fixed distance due south from its breeding location (Fig. A1A), and then 2) further shifting the individual in a random direction (sampled from a uniform distribution between 0 and 360; Fig. A1B-C). The distance of this second shift was sampled from a lognormal distribution with SD = 1, and a mean that we varied across scenarios, allowing us to simulate different strengths of connectivity (values of 3, 5 and 7 were used).
Simulating study designs
Two basic sampling designs were applied to simulated populations (Fig. 1):
Area scenarios
To test for the effect of sampling area size, a single rectangular sampling area was used, centred within the breeding area, within which 200 individuals were sampled at random for tracking. The size of this rectangular area was varied to generate three scenarios of increasing total study area with sample size held constant (sampling areas are illustrated in Fig. 1c, e, & g).
Spread scenarios
To test for effects of sampling ‘spread’ under a fixed study area design, 200 individuals were randomly sampled from nine rectangular sampling areas (sites) distributed in a 3 × 3 grid formation centred within the breeding area. Spacing between these sites was then varied to generate three scenarios of differing sampling spread, holding the size of sampled area and sample size constant in each case (sampling sites are illustrated in Fig. 1k, m, & o).
We generated 100 replicate datasets for each scenario (area and spread), and repeated this for each of the three strength levels of connectivity tested.
Estimating migratory connectivity
Using sampled individuals from each scenario, we calculated Mantel correlations using the mantel.rtest function within the ade4 package in R [25]. In each case, we assumed that all individuals sampled in the breeding range were tracked successfully to their winter locations and there was no location uncertainty. Scores were then assessed with respect to: 1) the difference between the observed Mantel score and the ‘true’ value calculated for the entire global population of 10,000 individuals, and 2) the difference between the observed Mantel score and an equivalent ‘true’ value calculated using all individuals inhabiting the strict spatial extent of sampling (henceforth ‘zone’).
Sample size scenarios
We tested a range of sample sizes to examine how precision varies in relation to the proportion of a population being sampled. For each level of connectivity, we randomly sampled individuals from the entire breeding range (global population N = 10,000), applying sample sizes of 10 (0.1%), 50 (0.5%), 100 (1%), 1000 (10%), 2500 (25%), and 5000 (50%) individuals. For each sample size and connectivity scenario, 100 replicates were generated with Mantel scores calculated following the previously described method. Bias was determined as the difference between the observed score and the values for the entire simulated population of 10,000 individuals.
Patchy population scenarios
Populations in the real world seldom conform to contiguous blocks, and often show a patchy distribution. To examine how this patchiness influences the effect of spatial sampling design on Mantel statistics, we simulated populations inhabiting four equal-sized sub-populations situated at the corners of the breeding range, within which individuals were distributed at random (Fig. A2). Migrations were then simulated using the same process described above (see Fig. A1), but populations were then further restricted to include only individuals that reach four equal-sized regions in the non-breeding area. This was to ensure clearly delimited sub-populations during both the breeding and non-breeding period. We then applied a rectangular sampling area centred within each breeding sub-population, across which 200 individuals were sampled at random for tracking. The size of the rectangular areas was then varied to generate three scenarios of increasing total study area (with sample size held constant).
Simulating realistic species ranges
To examine how spatial sampling designs influence connectivity estimates when applied to more realistic migratory populations, we generated further simulated populations that were constrained within real-world breeding and non-breeding BirdLife range maps [26] for three bird species selected to represent diverse range structures (Henslow’s Sparrow Passerculus henslowii, Aquatic Warbler Acrocephalus paludicola, and Falcated Duck Mareca falcata; note that subsequent simulated populations are not intended to be accurate replications of these species). To simulate realistic distributions of individuals within each range, we applied an algorithm to generate spatially-autocorrelated occurrence patterns (i.e. spatial clustering of individuals rather than a uniform distribution) using the nlm_gaussianfield function from the NLMR package [27] to generate a Gaussian random field of spatially-clustered values (scaled to vary between 0 and 1), applying an autocorrelation range of 10 and a magnitude of variation of 100 to generate spatial clustering (Fig. A3A). We then distributed 50,000 individuals across each range in proportion to the resulting random field values (Fig. A3B), with spatial autocorrelation ensuring that individuals were clustered in space, with areas of high and low abundance.
To generate a range of differing levels of migratory connectivity for each simulated species range, we used an algorithm that matched breeding and non-breeding locations for individuals according to their longitudinal ranks (Fig. A3C). For each individual in the breeding range, we randomly-selected a non-breeding location from all available points within a given bandwidth of longitudinal rank, and controlled connectivity levels by varying this bandwidth. For example, with a bandwidth of 1000, an individual of longitudinal rank of 5000 on the breeding zone would be randomly assigned a point from all those between longitudinal ranks 4000—6000 on the non-breeding zone. Larger bandwidths of longitudinal ranking on the non-breeding zone therefore result in lower migratory connectivity. Bandwidth sizes of 1000, 13,000, and 25,000 were used to produce three levels of migratory connectivity for each species. To sample the resulting populations, we assigned discrete study areas of fixed size to the 20 highest-density cells within a coarse grid overlain across the breeding range (Fig. A3D–E). This reflects the common logistical constraints (difficulty in catching individuals for tagging, access restrictions, and financial limitations) that may force researchers to restrict their sampling to areas where their chosen species are known to be more abundant. We then selected 200 individuals at random from these sampling sites. To explore the impact of variation in spatial sampling extent on resulting connectivity estimates, we varied the number of sampling sites from which these individuals were drawn, ranging from 3 to 20 sites selected at random from the pool of 20. We repeated this 100 times for each possible sampling scenario and level of connectivity and calculated resulting Mantel scores as well as the mean distance between centroids of sampling sites.
All simulations and statistical analysis were performed with R 4.3.0 (R [28]). Scripts for the completed analysis, including all simulations, are available as electronic supplementary material.