Periodograms come from the field of signal processing. The approach works by decomposing the signal (here, animal locations) into a sum of sinusoids of fixed frequenciesFootnote 1. Periodograms facilitate visual identification of the frequency or frequencies that contribute most to the signal [21], and are the basis for nonparametric statistical tests of periodicity.
The discrete Fourier transform
The basic issue at hand is to identify periodicity in a signal X = {X
j
; j = 1, …, N} where X
j
is an animal’s recorded location at time t
j
. In animal movement applications, X
j
is typically two-dimensional (latitude, longitude) but might also have an altitude component. Assuming stationarity, i.e., that the animal does not change its periodic behavior during the course of the study, the discrete Fourier transform of X, denoted DFT{X}, yields an estimate of the contribution of any given frequency f to the signal [21]:
$$ \boldsymbol{D}\boldsymbol{F}\boldsymbol{T}\left\{\boldsymbol{X}\right\}(f)={\displaystyle \sum_{j=1}^N}{\boldsymbol{X}}_j\cdot {e}^{-ijf} $$
(1)
with i the imaginary unit such that \( i=\sqrt{-1} \). Importantly, the DFT requires that the interval Δt between subsequent samples is constant, i.e., t
j
= t
1 + (j − 1)Δt.
The DFT-periodogram of a multidimensional signal is P
DFT
(f) = 1/K ⋅ ∑
k
(1/N ⋅ |DFT{X}(f)|
k
|2) where ⋅ |
k
denotes the k
th dimension (k = 1 … K). The DFT-periodogram reaches a local maximum if frequency f is in phase with the frequency of X. The smallest detectible period in the signal depends on the sampling interval Δt. Frequency F = 1/2Δt is known as the Nyquist frequency and is the largest detectible frequency in a signal sampled at interval Δt [21]. There is also a direct relationship between the sample size N and the default frequency resolution of the periodogram: Δf = 1/(N ⋅ Δt). It is possible to refine the frequency resolution (increase the number of points in the periodogram) up to Δf = F/N, after which there is no new information in the data to inform new points in the periodogram, leading to autocorrelation in subsequent error terms and an overly smooth periodogram (section A.2 in Additional file 1). The DFT is now a standard tool for signal processing especially since it can be efficiently computed via the fast Fourier transform (FFT) [28], which is available in most statistical software.
The Lomb-Scargle periodogram: an extension of the DFT important for movement ecology
The condition that the sampling interval is constant, which is necessary to apply the DFT, is rarely met in animal tracking datasets. Tracking datasets are often scheduled to be collected on an even time grid but feature many missing records, or the sampling interval varies in duration due to delays in GPS signal acquisition, or the tracking devices may be purposely “duty cycled” to alternate between coarse and fine sampling rates. This renders the DFT inapplicable without first manipulating the data, which usually involves either thinning them until the sampling intervals are constant, or interpolating missing observations. The Lomb-Scargle periodogram (LSP) [20] extends the DFT to situations with missing data or variable sampling intervals, without ad hoc data manipulations. For this reason, the method has been progressively adopted in many fields where identifying periodicities in imperfect datasets is of interest, including astronomy, climate science, and biology [15–18, 22–24]. This article introduces the LSP as an effective non-parametric method to explore an animal tracking dataset for periodic patterns of space use, something that to our knowledge has not been done previously. Movement ecologists can also use the LSP to look for periodicity in activity level [15–18] but the present study does not emphasize this aspect (see however the section “African buffaloes” below).
A new algorithm to compute the Lomb-Scargle periodogram
First, we introduce and describe a new algorithm to compute the LSP. The motivation for this new algorithm is to be able to analyze long time series (>106 records like [29]), or large number of moderately long times series (>50 individuals with >104 records per individual like [30]), while maintaining computation times short enough that the protocol can be used at the data exploration stage. Importantly, the choice of an algorithm (our new approach or any of the preexisting ones, see below) does not influence the inference; all LSP algorithms compute the same quantity, but their computation times vary. The full derivation of the new algorithm is described in Additional file 1 and briefly outlined below. First, we define the sampling schedule function w on a regular time grid {t
j
; j = 1 … N}:
$$ w\left({t}_j\right)=\left\{\begin{array}{cc}\hfill 1\hfill & \hfill if\ \boldsymbol{X}\left({t}_j\right)\ is\ recorded\hfill \\ {}\hfill 0\hfill & \hfill if\ \boldsymbol{X}\left({t}_j\right)\ is\ missing\hfill \end{array}\right.\kern2em with\kern1.5em {t}_{j+1}-{t}_j=\varDelta t\kern0.5em \left(\forall j\right) $$
(2)
If the interval between two records was intended to be constant but some records are missing, {t
j
} simply codes for the intended sampling schedule. In the case of a duty cycle with varying sampling rate, {t
j
} codes for the finest sampling resolution. If the sampling interval was intended to be constant but eventually varied around that intended value (e.g., due to delays in GPS signal acquisition), our R package ctmm [31, 32] defaults to the median sampling interval for constructing w (see further details in section A.7 in Additional file 1). The user can however optionally refine the temporal resolution of {t
j
} through the res.time option. In this case, we use the Lagrange interpolation of the sinusoids [25] (see more detail below), causing an increase in computing time. We find res.time = 2 to be generally sufficient for most datasets affected by this issue of random variation in sampling interval, while res.time = 4 is almost exact within numerical precision.
Then we exploit the equivalence of the periodogram at frequency f to a least-square fit of the data to sinusoids of frequency f (Scargle 1982) [20].
$$ \boldsymbol{X}\left({t}_j\right)\approx \boldsymbol{A}(f){e}^{2\pi if{t}_j}+\boldsymbol{A}{(f)}^{*}{e}^{-2\pi if{t}_j} $$
(3)
where the amplitude A(f) is the parameter of interest and the * symbol denotes the complex conjugate. The squared error of this fit is, for each dimension k:
$$ {L}^{(k)}(f)={\displaystyle \sum_j}w\left({t}_j\right)\cdot {\left|{\left.\boldsymbol{X}\left({t}_j\right)\right|}_{\boldsymbol{k}}-\left({\left.\boldsymbol{A}(f)\right|}_{\boldsymbol{k}}{e}^{2\pi if{t}_j}+{{\left.\boldsymbol{A}(f)\right|}_{\boldsymbol{k}}}^{*}{e}^{-2\pi if{t}_j}\right)\right|}^2 $$
(4)
To estimate the amplitude, we minimize the cost function L(f) = ∑
k
L
(k)(f). It is possible to perform this minimization analytically (Additional file 1; Eqs. A.9–A.11). Following Scargle (1982) [20], the LSP of the k
th dimension of X is then expressed as a function of the amplitude:
$$ LS{P}^{(k)}(f)=\frac{1}{2}\left({\displaystyle \sum_j}w\left({t}_j\right){\left|{\left.\boldsymbol{X}\left({t}_j\right)\right|}_{\boldsymbol{k}}\right|}^2-\underset{\boldsymbol{A}(f)}{min}\ {L}^{(k)}(f)\right) $$
(5)
yielding, after manipulations that are detailed in Additional file 1: Eqs. A13–A15, the following, new formula for the empirical Lomb-Scargle periodogram of a multidimensional time series:
$$ \begin{array}{l}L\widehat{S}{P}^{(k)}(f)\Big|=\frac{DFT\left\{w\right\}(0)\cdot \Big|\operatorname{}\boldsymbol{D}\boldsymbol{F}\boldsymbol{T}\left\{w\boldsymbol{X}\right\}(f)\Big|{}_{\boldsymbol{k}}\Big|{}^2-\mathrm{R}\mathrm{e}\left(DFT\left\{w\right\}(2f)\cdot \operatorname{}\boldsymbol{D}\boldsymbol{F}\boldsymbol{T}\left\{w\boldsymbol{X}\right\}(f)\Big|{}_{\boldsymbol{k}}^2\right)}{FT\left\{w\right\}{(0)}^2-\Big|\operatorname{}DFT\left\{w\right\}(2f)\Big|{}_{\boldsymbol{k}}\Big|{}^2}\operatorname{}\kern1em \\ {}\kern1.5em L\widehat{S}P(f)\Big|=\frac{1}{K}{\displaystyle \sum_{k=1}^K}L\widehat{S}{P}^{(k)}(f)\operatorname{}\kern1em \end{array} $$
(6)
where Re(⋅) denotes the real part of a complex number and ⋅ |
k
denotes the k
th dimension of a variable. In Eq. 6, we can see that if there are no missing data, the LSP reduces to the DFT-periodogram of X because then DFT{w}(0) = N and DFT{w}(f) = 0 (∀ f > 0).
What makes this new formula for the LSP fast is that, contrary to the original implementation by Scargle (1982) [20], it is entirely based on the DFT, and can therefore be computed using the fast Fourier transform (FFT) [28]Footnote 2. Exploiting the proven speed of a known algorithm like the FFT is a standard method for developing more efficient computing techniques (e.g., [33]). For the case that interests us here, we assessed the gain in computing time using both algorithmic complexity considerations, and by directly measuring computing time. The computational complexity of an algorithm is conveyed with notation O, meaning “proportional to”. The complexity of the FFT algorithm is known to be O(N log N) [28], that is, the computing time of the FFT is proportional to N log N, where N is the number of records in the time series. Being entirely based on the FFT, our new algorithm for the LSP is therefore also O(N log N). By contrast, Scargle’s original computation of the LSP requires running twice through the time series, yielding a O(N
2) complexity, that is a computing speed proportional to N
2. The difference between O(N log N) and O(N
2) can be enormous in practice: our algorithm with O(N log N) complexity runs within seconds in situations where algorithms with O(N
2) complexity need more than a day (Fig. 1). Previously, only Press & Rybicki managed to derive an O(N log N) routine for the LSP, also by using the FFT [25]. However, the Press & Rybicki algorithm is much more complex than ours; as a consequence, all R packages and freeware for biologists that have a LSP functionality, and of which we are aware [22, 34–37], are based on Scargle’s original algorithm, not on Press & Rybicki’s fast algorithm. Furthermore, although the Press & Rybicki algorithm is based on the FFT like our new algorithm, it requires additional calculations including finer gridding, Lagrange interpolation, and O(N) square-root and trigonometric function evaluations [25], that our approach does not require as long as the time grid {t
j
} is even or can be approximated as such (see beginning of section for definition of {t
j
}). These additional steps also render the Press & Rybicki algorithm only an approximation to the LSP, whereas our approach is exact as long as the time grid {t
j
} is even. When the interval durations are variable, i.e., {t
j
} is not even, both methods are approximate. In the ctmm implementation of our fast algorithm, users can further control the amount of error through the frequency resolution option (res.freq).
In summary, because it is based on the FFT and is therefore of O(N log N) complexity, our new algorithm substantially increases the speed of the computation of the LSP compared to other R implementations, which are based on Scargle’s original O(N
2) algorithm (Fig. 1). This makes LSP computation tractable even for long time series (like [29]), or for numerous time series (like [30]). Importantly, whatever the algorithm used (ours, Press & Rybicki’s, or Scargle’s), the biological inference and statistical power are not affected: the same quantity is estimated.
Lastly, our implementation in the R package ctmm [31, 32] also accommodates two important features that are commonplace in movement data but infrequent in other fields. The first of these features is multi-dimensionality; all other LSP implementations in R are 1-dimensional and therefore require post-hoc manipulations to combine results from latitude and longitude time series. In the ctmm implementation, periodicity in the latitude, longitude, and altitude can be investigated jointly or separately. The second feature is multiple individuals. Movement ecology datasets often comprise multiple individuals which are expected to behave in similar ways, i.e., exhibit the same periodic patterns. In such a situation, ctmm allows fitting multi-individual periodograms, in order to augment sample size and better separate the periodic patterns from the background noise. The principle is described in section A.3 of Additional file 1 and an example presented in the “Maned wolves” section below.
Interpreting periodograms
First, the overall shape of the periodogram of an animal’s track record is influenced by the temporal autocorrelation structure of the aperiodic background noise. This issue was previously largely ignored because in astronomy, which is where the LSP was developed and where most extensions of the LSP have been published, the background noise can be assumed to be independently and identically distributed (“white”) [27]. However, in movement ecology, the stochastic aperiodic component of the animals’ paths is expected to exhibit temporal autocorrelation in the location process, the velocity process, or both (“colored”) [26]. As a consequence, previously developed statistical tests of periodicity [22, 27] cannot be applied to movement data. Instead, the peaks in the periodograms must be compared to the autocorrelated background noise (see the section entitled “Artefactual periodicities and a new statistical test for periodic patterns of space use” below). A graphic representation of the LSP of the four most commonly used aperiodic continuous-time stochastic movement models is provided in Fig. 2. We do not recommend using the LSP as a diagnostic tool to distinguish different aperiodic stochastic models (see instead [26, 32, 38]). However, knowing what the periodogram of an aperiodic animal path should look like can be very useful to interpret empirical periodograms, especially given the rough aspect of most empirical periodograms.
The important features to look for in periodograms are peaks (Additional file 2, section C1). Peaks correspond to frequencies that resonate with the signal. These peaks vary in width and height (Figs. 3, 4 and 5; Additional file 2). Regarding the width of the peaks, as mentioned earlier, the default resolution of the periodogram is, in the frequency domain, Δf = 1/(N ⋅ Δt) where N is the number of recorded locations and Δt is the (median) sampling interval between two records. In the time domain, this yields a resolution that increases with the period: ΔT = T
2/(N ⋅ Δt) ≈ T, meaning that, if the sampling schedule stays the same, the longer the period, the wider the peak. Peak width, therefore, is typically an artifact of the periodogram’s natural resolution, so that longer periods will often have wider peaks (Additional file 2, section C.4). Peak width thereby typically carries little biological information about periodicity in the mean of the movement process. Some of this artefactual variation in peak width can be removed using the max = TRUE option of the plot method for periodogram objects in ctmm, and by increasing the frequency resolution (the number of points) with the res.freq option when computing the periodogram. With the max=TRUE option, only local maxima of the periodogram are plotted. This post-hoc coarsening of the frequency resolution removes in particular the artefactual oscillations caused by the autocorrelation in the error term of the periodogram with a period of 1/D on the frequency scale (∼ T
2/D on the period scale), where D is the overall study duration. These oscillations, when not discarded using the max option, are typically visible for large periods (Figs. 3, 4 and 5).
By contrast, the height of the peaks, i.e., the difference between the periodogram value at the peak and around the peak, conveys information about the signal-to-noise ratio [22, 27]. The statistical significance of the periodic pattern can be inferred by comparing the peak height to the baseline periodogram of the background noise [27] (see more details under “Artefactual periodicities and a new statistical test for periodic patterns of space use” below).
Third, periodogram interpretation needs to take harmonic series into account (section C.2 in Additional file 2). A harmonic is defined as a component frequency of the signal that is an integer multiple of the fundamental frequency (the frequency with the largest signal-to-noise ratio). Harmonic series occur if the periodic signal is not perfectly sinusoidal, i.e., if the waveform is different from a sinusoid, or if the repeated pattern is different from an ellipse. In movement ecology, periodic patterns of space use are therefore expected to always generate a series of harmonics. We performed a simulation study to illustrate the occurrence of harmonics in a few simplified situations (section C.2 in Additional file 2). These simulations illustrate that users should not over-interpret peaks in the periodogram that occur for periods that are a fraction of the fundamental period (the highest peak in the periodogram). Even in very simplified cases, predicting which harmonics would be activated was far from straightforward, and depended on the sampling schedule, the velocity, and the shape of the repeated pattern. In real-life applications, harmonic series are unlikely to bring interpretable information about the shape of the repeated pattern. The difficulty further lies in identifying whether the signal is multi-periodic (e.g., both daily and hourly periods) or mono-periodic with harmonics. If peaks occur for periods that are not integer multiples of each other, the signal is multi-periodic. If the peak for a short period (e.g., one hour) is higher than the peak for a long period (e.g., one day), the signal is also multi-periodic. In most other situations, the signal is likely monoperiodic with harmonics.
Artefactual periodicities and a new statistical test for periodic patterns of space use
The issue at hand in this section is the identification of peaks in the periodogram that are not related to an underlying periodic behavior, but are instead caused by the sampling schedule. This can for example be the case if the tracking device is duty cycled to alternate between fine and coarse sampling rates, in effect creating a periodic sampling schedule, or if there is autocorrelation in the way missing data occur. In such situations, periodicity in the sampling schedule can be transferred into periodicity in the animal tracking record, even if the true, underlying path is not periodic (section C.3 in Additional file 2). In most previous applications of the LSP, this issue was largely ignored: researchers assumed missing observations or sampling interval variations occurred without autocorrelation [22]. In movement ecology, the issue cannot be ignored anymore (e.g., [26]), requiring innovative treatments.
In order to jointly deal with the issues of 1) temporally autocorrelated background noise; 2) possible confusion of random peaks with periodicity-induced peaks; and 3) artefactual periodicities created by the sampling schedule, we devised the following two-pronged approached based on a null model simulation routine and a visual diagnostic.
Visual diagnostic
The objective here is to visually compare the periodogram of the location time series, X(t), and the periodogram of the sampling schedule, w(t). If the periodicity observed in the data is artefactual, then the same peak should occur in the periodogram of the sampling schedule and in the periodogram of the locations. After rescaling the two periodograms so that they both have a maximum value of zero (automatically performed by ctmm), the periodogram of the recorded locations is plotted alongside the periodogram of w. If a peak occurs in both periodograms for the same period, then it is likely that the detected periodicity is artefactual. However, a true periodicity could plausibly be superimposed on an artefactual one, so that the occurrence of a peak of same periodicity in the two periodograms does not prove the absence of a periodic pattern, but only indicates the risk of false positive due to issues in the sampling schedule. This visual diagnostic approach has the advantage of being fully non-parametric. It can thus be employed on any time series irrespective of which type of underlying movement process generated the data or how irregular the sampling schedule is. It can therefore be used without restriction. The visual diagnostic approach is available in package ctmm through the diagnostic = TRUE option of the plot method for periodogram objects.
Null model approach
Here, we first use an information-theoretic approach to select a preferred aperiodic continuous-time stochastic model and fit it to the data. The theory for these models is developed in e.g., [38] and recommendations for model fitting and model selection with ctmm are detailed in [32]. The preferred aperiodic continuous-time stochastic model acts as the null model, representing the hypothesis “there are no periodic patterns of space use in the data”. It is the reference frame against which peaks in the periodogram are going to be compared. We use this model to generate simulated datasets with the same sampling schedule as the real data. Any irregularity in the sampling schedule or pattern in the way missing data occur is thus carried over in the simulations. Finally, we compute the proportion of the simulated periodograms in which the value at the period of interest is larger than the value from the real data. This proportion is the P-value of the periodicity test (the sensitivity of the test depends on the number of simulations). An analogous approach was proposed for the case where the background noise is independently and identically distributed (“white”) by [27] and is implemented by default in the lomb package [22]. Here, we thus generalize this original statistical test of periodicity to the case where the background noise is temporally autocorrelated (“colored”), which will typically be the case with movement data. An R script implementing this approach (assuming a preferred null model has already been selected) is provided in Additional file 3.
African buffaloes
This case study is aimed at 1) illustrating the issue of artefactual periodicities created by the sampling schedule (see also section C.3 in Additional file 2 for a simulated example), and 2) highlighting that periodicities in activity levels need not be translated into periodic patterns of space use. Data came from two African buffalo cows named Cilla and Pepper in the Kruger National Park of South Africa, with hourly GPS position records [39]. Cilla had no missing observations and displayed aperiodic space use, whereas Pepper had many missing observations and a periodogram that exhibited a clear daily periodicity. Based on the visual diagnostic, Calabrese et al. [32] suspected that the periodicity in Pepper’s periodogram was artefactual and caused by a collar malfunction. To prove that point, we discarded data points from Cilla in order to mimic Pepper’s sampling schedule. To do so, we first shifted the sampling schedules so that the time series for the two individuals started at the same time, and then discarded all fixes from Cilla’s tracking record that were collected more than 10 min from a fix from Pepper. This yielded a subsample of Cilla’s locations affected by the same missing data problem as Pepper. We predicted that the periodogram from Cilla’s resampled tracking record would exhibit the same (artefactual) peak as Pepper’s. Sample sizes were 3528 fixes for Cilla, of which 2153 were discarded in the manipulation, and 1726 fixes for Pepper. Lastly, we tested Pepper’s periodicity using the above-described null-model approach. If artefactual, the periodicity should not be significant in this test (P-value >0.05).
We also used Cilla’s (complete) data to illustrate the difference between periodic patterns of space use and periodicity in activity level. The former corresponds to the existence of a characteristic revisitation time between two passages of an animal in a given place, and is our focus in this paper. The latter corresponds to rest/activity cycles, and has been the focus of most other studies applying periodograms to movement data [4, 14–18]. In this case, African buffaloes characteristically show reduced activity during the mid-day heat, so we predicted a daily periodicity in Cilla’s activity level whether or not her space use was periodic. Activity was measured by movement speed, which we quantified as the length of each recorded step divided by the duration of the corresponding time interval.
Waved albatrosses
We analyzed 90-min resolution GPS tracking data from waved albatrosses during their 2008 breeding season [40, 41]. We selected data from four adult birds breeding on Española island in the Galapagos archipelago, that were monitored for more than 45 days (range 47–97 days) and undertook more than one fishing trip during that time. During incubation, the male and female of a pair alternate nest duties and fishing trips, so that the nest is always attended [40]. Most fishing trips are towards nutrient-rich up-welling zones off the coast of Peru about 1200 km from the breeding island. Waved albatross’ diet include a large proportion of squids [42] which are mainly available to them at night [6]. Although waved albatrosses exhibit some morphological adaptations such as larger eyes and larger distance between them than related species, we suppose that they still have poor night vision like most other albatrosses [43, 44]. Hence, we formulate the hypothesis that foraging behavior should be constrained by moon light, and therefore exhibit a periodicity of c. 29.5 days.
Maned wolves
Maned wolf, the largest South American canid, is exceptional among similarly-sized canids for its solitary foraging behavior and omnivorous diet, which raises questions regarding its daily energy budget [45] and social system [46]. The occurrence of a periodic pattern of space use corresponding to “routine behaviors” is potentially of foremost importance for energy budget and sociality, but remains poorly understood. While there is anecdotal evidence that maned wolves repeatedly use the same paths when patrolling their home range in search of food in Brazil [47], routes are diversified in Bolivia [48]. Interestingly, the behavior is documented in other, more intensively-studied canid species, but even in these species the extent to which it is expressed is variable and poorly known [49, 50]. We used tracking data from eight wolves collected in or near the Serra da Canastra National Park, Brazil, a grassland ecosystem, and used the LSP methodology to test for a daily periodicity in those eight individuals. Maned wolves were captured using live-traps baited with cooked chicken and sardines, sedated (with direct injection of tiletamine-zolazepan), and equipped with VHF/GPS-Collars (Lotek Wireless Inc. GPS 3300S and Iridium Track 1D, and Sirtrack Limited Pinnacle Lite G5C 275D). The devices were set to record one location every 1 to 4 h (depending on the animal).