Correlated velocity models as a fundamental unit of animal movement: synthesis and applications

Background Continuous time movement models resolve many of the problems with scaling, sampling, and interpretation that affect discrete movement models. They can, however, be challenging to estimate, have been presented in inconsistent ways, and are not widely used. Methods We review the literature on integrated Ornstein-Uhlenbeck velocity models and propose four fundamental correlated velocity movement models (CVM’s): random, advective, rotational, and rotational-advective. The models are defined in terms of biologically meaningful speeds and time scales of autocorrelation. We summarize several approaches to estimating the models, and apply these tools for the higher order task of behavioral partitioning via change point analysis. Results An array of simulation illustrate the precision and accuracy of the estimation tools. An analysis of a swimming track of a bowhead whale (Balaena mysticetus) illustrates their robustness to irregular and sparse sampling and identifies switches between slower and faster, and directed vs. random movements. An analysis of a short flight of a lesser kestrel (Falco naumanni) identifies exact moments when switches occur between loopy, thermal soaring and directed flapping or gliding flights. Conclusions We provide tools to estimate parameters and perform change point analyses in continuous time movement models as an R package (smoove). These resources, together with the synthesis, should facilitate the wider application and development of correlated velocity models among movement ecologists. Electronic supplementary material The online version of this article (doi:10.1186/s40462-017-0103-3) contains supplementary material, which is available to authorized users.

functions that simulate, diagnose and estimate these movement models, as well as functions for facilitating a change point analysis.
1 Correlated velocity movement models 1/τ non-zero ACVM(τ, η, µ x , µ y ) Rotational CVM 1/τ + iω 0 RCVM(τ, η, ω) Rotational-Advective CVM 1/τ + iω non-zero RACVM(τ, η, ω, µ x , µ y ) Briefly, CVM models are movement models in which the velocity is assumed to be an autocorrelated stochastic process taking the form of an Ornstein-Uhlenbeck equation: where the key parameters are τ -the characteristic time scale of autocorrelation, η -the root mean squared speed of the movement process. Different valeus of the additional parameters α and µ are related to variations of a CVM process, as per table 1. We present the models together with functions that estimate them sequentially below.

Unbiased CVM
The unbiased CVM is the continuous time equivalent of an unbiased correlated random walk, i.e. a movement that has local persistence in direction. This is the most "fundamental" CVM model and its mean speed is given by a simple expression ν = η √ π/2, the a convenient parmeterization is as: UCVM(τ , ν). Also, it is for this model that the complete set of estimation methods discussed in the paper is available. For these reasons, it is treated slightly differently than the (R/A)CVM models in the smoove package.
The function for simulating the UCVM is simulateUCVM. There are two methods of simulation, direct and exact.

Direct method
The direct method works by discretizing the differential equation 1, i.e.: where V is a vector of complex numbers and dWi is drawn from a bivariate normal distribution with variance √ dt. The direct method provides good simulations relatively quickly at high resolution (∆t τ ).
Loading the package:

Exact method
The exact method is more flexible, as it samples the position and velocity process simultaneously from the complete mean and variance-covariance structure of the integreated OU process. The main advantage is that the process can be simulated for irregular (and totally arbitrary) times of observation: (100) The scan.track function is another function that is convenient for seeing a process in time (right plots). Note the irregularity of the sampling.

Rotational-Advective CVM
The (R/A)CVM models are parameterized in terms of τ , the random rms speed η, and then some combination of the advection vector µ and/or angular speed ω.

Estimation of CVM models
The main estimation functions are estimateUCVM and estimateRACVM. The respective help files have abundant examples of implementation.

UCVM
Following the structure of the Gurarie at al. (in review ), manuscript, the UCVM, which focusses on estimates of time scale τ and speed ν, can be estimated using one of five different methods.

VAF fitting
This method relies on fitting the empirical velocity autocorrelation function (above) to the theoretical velocity autocorrelation function: The diagnostic plot (only plotted with diagnose=TRUE) illustrates the quality of the fit. The estimates should be fairly good, when compared to the true parameters: A lower-resolution track will have wider confidence intervals: ucvm.lores <-simulateUCVM(nu=10, tau = 4, dt = 1, T.max = 1000, method = "exact") with(ucvm.lores, estimateUCVM(z = Z, t = T, CI=TRUE, method="vaf"))

##
Estimate C.I.low C.I.high ## tau 5.515561 3.573607 12.08006 ## nu 10.324379 9.769988 10.87877 NB: This method works only for regularly sampled data, and the confidence intervals tend to be unreliable.

CRW matching
The Correlated Random Walk (CRW) matching method computes the CRW parameters (shape and scale of length steps and wrapped Cauchy clustering coefficient) and converts those to time-scales and speed estimates. Confidence intervals are obtained by Monte Carlo draws from the confidence intervals around likelihood estimates of the CRW parameters. tau <-2; nu <-8 ucvm3 <-simulateUCVM(T=1:1000, nu = nu, tau = tau, method = "exact") plot(ucvm3$Z, asp=1, type="o", cex=0.5, pch=19) The diagnosis plots (crudely) illustrates the standard autocorrelation function for the step lengths and turning angles. Under the normal assumptions of the CRW, these should both be around 0 at lags greater than 0. The more autocorrelated either of these time series is, the more biased the estimates are likely to be. NB: This method is illustrative, generally biased, and recommended except for data that are relatively coarsely sampled. It is, however, very fast to compute.

Velocity likelihood
The velocity likelihood method is based on using the known distributional properties of the integrated OU obtain a "one-step" likelihood of each velocity based on the previous velocity. This method is robust to irregularly sampled data and is fairly fast. We will estimate the original track estimateUCVM(z = ucvm1$Z, t = ucvm1$T, method = "vLike", CI=TRUE) This method gives very reliable confidence intervals, but may underestimate velocities somewhat. We can also run it on the irregularly sampled track from above: The confidence intervals are more wide this time (due to a much smaller data set: 100 versus 10001 data points), but the estimates should be fairly accurate.

Position likelihood
The position likelihood (zLike) takes the complete correlation structure between the velocities and the positions to estimate the parameters. It is prohibitively slow on a data set the size of ucvm1, but quite fast on the second (irregular, short) sampling: estimateUCVM(z = ucvm2$Z, t = ucvm2$T, method = "zLike", CI=TRUE) Note, improvements in the parameter estimates and confidence intervals, and the inclusion of estimates for the initial speed (usually a parameter of less interest).

Position likelihood: Kalman filter
We (OO and EG) had independently worked out many of the details of the estimation of full position likelihood before we discovered that the same likelihood was estimated in a much more efficient way by Johnson et al. (2008), and encoded in an excellent package called "crawl" (Correlated RAndom Walk Library). We have incorporated a wrapper for this package in smoove for the estimation of UCVM parameters -a fairly narrow application of the capabilities of crawl. First, we run it on the irregular data set: with(ucvm2, estimateUCVM(z = Z, t = T, method = "crawl")) The estimates and C.I.'s for ν and τ are very similar to the full position likelihood above. The additional output is specific to the parameterization Johnson et al. use, the lower output (nutau) is consistent with the output for the other methods. The crawl method can handle a very long (n = 10000) dataset as well.
with(ucvm1, estimateUCVM(z = Z, t = T, method = "crawl"))$nutau with correspondingly smaller confidence intervals. There are many additional options that can be passed to the crawl solver, detailed in the help file for crwMLE.

RACVM
Estimating the RACVM processes is different mainly in that there are now four different models to compare (with and without each of rotation and advection), and a model selection method is intrinsically built in. For the time being, the method implemented for fitting these models is the velocity likelihood.

Kestrel data
In the paper, we analyze a phenomenal data set of a single lesser kestrel ( The model selection confirms this (note that we specify the time unit of analysis -in this case, seconds).
(fit1 <-with(K1, estimateRACVM(XY = cbind(X,Y), T = timestamp, spline=TRUE, model = "RACVM", time.units = "sec"))) The track is very regularly but very slightly coarsely sampled. We therefore use the spline correction (though its impact is probably minimal). The model selection strongly prefers an RACVM model. The rotational speed is about 0.5 radians per second, the advective speed is aruond 4 m/sec. We feed these estimates back into the simulation algorithm to simulate a trajectory:  Visually, the simulation looks rather similar with respect to radii and frequence of rotations, randomness and advective tendency to the data snippet.

Behavioral change point analyses
Change point analysis (like most of ecological analysis) is not an exact science. However, the existence of likelihoods and model selection criteria makes it possible to develop an heuristic that can propose and assess candidate change points, select "significant" ones, and estimate the movement model and parameters between those change points.

Step I -the Window Sweep
TO perform a window sweep, we need to set two variables: the windowsize of analysis and the windowstep. Determining the appropriate windowsize is black magic, i.e. should be driven by "biological" considerations. As a rule of thumb, less than 30 is too short.
simSweep <-sweepRACVM(Z=Z, T=T, windowsize = 100, windowstep = 10, model = "UCVM", progress=FALSE) Toggling the progress to TRUE will output a series of progress bars in the R console. In the image above, each color represents the relative log-likelihood profile for a single window. There are two distinct peaks, the remainder of the algorithm boils these down to "significant" change points. The warning lets us know that some of the candiate change points are rather close to each other, and it might be a good idea to cluster them, i.e. assign multiple points within a given time interval to a single cluster. If we take a rather generous cluster width below: We now have a more limited set of change points to assess.

Step III -Selecting Significant Change Points
We use the set of candidate change points to separate the track into phases and for each change point (i.e. pair of neighboring phases) we assess the "significance" of the change point by comparing the BIC (or AIC) of a two-model versus one-model fit. The function that performs this is getCPtable getCPtable(CPs = CP.clustered, Z = Z, T = T, modelset = "UCVM", tidy = FALSE, iterate = FALSE) In the output of selectRACVM, the extremes column indicates which parameter showed significant changes in parameter values defined by no overlap in the 95% confidence interval, and the dAIC and dBIC compare the change point model to the no change point model. We can ask this functin to tidy the selection table according to any combination of several criteria: by the extreme, by negative dAIC or dBIC, or by a mismatch in the models selected (see the help file).

Kestrel data BCPA
We perform this analysis now on a portion of the kestrel data analyzed in the Gurarie et al. (in review) manuscript. The basic steps are the same as above, but the selection occurs across all four models. The portion of data we analyze is: Note that the selected cluster width (2) was the minimum one that did not produce a warning for clusters that were "too close". We now iteratively test all of the change points for parameter values and model changes. Specifying modelset = 'all' is a shortcut for modelset = c('UCVM', 'ACVM', 'RCVM', 'RACVM'): XY <-cbind(k$X, k$Y) k.CPtable <-getCPtable(CPs = k.CPs, XY = XY, T = k$T, modelset = "all", tidy = c("extremes"), iterate = FALSE, spline=TRUE) Note that in this implementation, we chose the option spline=TRUE, as the regularity and apparent precision of these data and the curvature of the track suggest that the spline approximation will give improved estimates of the parameters.