 Commentary
 Open Access
 Published:
Commentary to: a crossvalidationbased approach for delimiting reliable home range estimates
Movement Ecologyvolume 6, Article number: 10 (2018)
The original article was published in Movement Ecology 2017 5:19
Abstract
Background
Continued exploration of the performance of the recently proposed crossvalidationbased approach for delimiting home ranges using the Time Local Convex Hull (TLoCoH) method has revealed a number of issues with the original formulation.
Main text
Here we replace the ad hoc crossvalidation score with a new formulation based on the total log probability of outofsample predictions. To obtain these probabilities, we interpret the normalized LoCoH hulls as a probability density. The application of the approach described here results in optimal parameter sets that differ dramatically from those selected using the original formulation. The derived metrics of home range size, mean revisitation rate, and mean duration of visit are also altered using the corrected formulation.
Conclusion
Despite these differences, we encourage the use of the crossvalidationbased approach, as it provides a unifying framework governed by the statistical properties of the home ranges rather than subjective selections by the user.
Background
Continued exploration of the the crossvalidationbased approach proposed in [1] has revealed a number of issues with the original formulation of the optimization equation. This original formulation was ad hoc in its combination of two statistical approaches (crossvalidation and information criteria), and the result was a metric without a clear basis in statistical theory. As such, we strongly recommend that users rely upon the method described here as opposed to one set forth in the original publication. In particular, the shortcomings can be summarized as follows:

1.
Both crossvalidation and information criterion approaches aim to avoid overfitting. In the case of crossvalidation, one attempts to estimate outofsample prediction error, so the score used should be a measure of prediction errors of the heldout points. If the model uses k too small or s too large, it is likely to overfit the training data and will predict the testing data poorly. On the other hand, if the model uses k too large or s too small, it will underfit the training data by missing the real variations in space use. Thus, crossvalidation naturally penalizes model complexity because excessive complexity (small k) results in poor predictions. Information criteria approaches include a penalty term that increases with model complexity as measured by larger numbers of parameters. Using such an information criterion as a crossvalidation score is not necessary since crossvalidation should naturally penalize excessive model complexity.

2.
The formulation of the information criterion score did not follow the rules of probability because probabilities of outofsample predictions were not properly normalized, and multiple probabilities were combined by summation. In this sense, it lacked a firm connection to the statistical theory underlying information criteria approaches.
Here we propose an alternative formulation in which we interpret a normalized version of LoCoH hulls as an estimated probability surface and recast the crossvalidation score as the total log probability of outofsample predictions, a common choice in crossvalidation schemes. The approach, explained in detail below, results in more appropriate behavior, but also has the effect of significantly altering the optimal parameter values selected by the algorithm. Thus, in addition to presenting the new crossvalidation equation, we include tables and figures with the newly selected parameter values and newly calculated derived metric values (home range area, mean duration, and mean visitation rates). Finally, we offer an alternative R script that searches a much broader parameter space in a more efficient manner (Additional file 1).
Updated CrossValidation Approach
Using the training/testing split as described in the original presentation of the algorithm, a gridbased exploration of parameter space was conducted (Fig. 1), whereby each of the training/testing datasets (i={1,...,n}) was analyzed at every combination of k and s values on the grid. This analysis entailed the creation of local convex hulls with k nearest neighbors and a scaling factor of s. In all subsequent analyses, we assume that the scaling of time follows a linear formulation; however, when movement patterns more closely exemplify diffusion dynamics, an alternative equation for the TSD may be more appropriate [2]. The test points (j={1,...,m}) were then laid upon the resulting hulls.
We formulate the probabilities for outofsample points by normalizing the LoCoH surface so that the probability of an observation occurring at a particular location can be calculated. This value is obtained by dividing the number of training hulls that contain the test point location (g_{i,j}) by the summed area of all training hulls (A_{i}). Then, the log probability was calculated for each point per training hullset. To avoid log probability values of  ∞, test points that were not contained within any hulls were assigned a probability value equal to the inverse of \(A_{i}^{2}\), resulting in a substantially lower log probability than that of a test point contained in a single hull. Finally, a single value (P_{k,s}) was assigned to each combination of k and s value by summing across all of the test points in all of the training/testing datasets:
Because the probability of each test point is normalized based on the total area contained within all of the training hulls, there exists a natural penalty for high k values. For example, a k value equal to the number of training points (k_{max}; regardless of the s value) will result in all hulls being identical and each test point overlapping all of the hulls. However, the large total area of the hullset when k=k_{max} will result in relatively small probability values for each test point (i.e., independent probability values equal to the inverse of the area of one of the hulls), effectively penalizing the parameter set containing k_{max}. The underlying crossvalidation procedure could very easily be extended for the optimization of the the adaptive parameter in the amethod (as opposed to the kmethod) because of its scaling with the total area of the hullset.
Results
The optimal parameter values selected using the corrected crossvalidation method are substantially different from those selected in the original publication (Table 1). However, because the original formulation was not supported by cohesive statistical theory, we will discuss these new results only in reference to the guidelinebased parameter values rather than comparing them to the results emerging from the published algorithm. The mean s value selected using the algorithm for springbok was 0.02 (SE = 0.008) and for zebra was 0.0012 (SE = 0.0005). The mean s value selected using the guidelines for springbok was 0.005 (SE = 0.002) and 0.017 (SE = 0.002) for zebra. Thus, the s values selected by the algorithm and the guidelines were not significantly different for springbok (p=0.10), but were for zebra (p<0.001). In the case of the k values, the optimal values selected using the algorithm were significantly higher than those resulting from the guidelines. The mean k value selected using the algorithm for springbok was 225.5 (SE = 66.83) whereas the mean using the guidelines was 22.5 (SE = 1.71; p=0.003). The same trend was observed in zebra where the mean k value based on the algorithm was 347.2 (SE = 54.36), whereas the mean from the guidelines was 20 (SE = 1.58; p=0.004).
The significantly higher k values emerging from the algorithm gave rise to significantly larger home ranges in both species (Table 2). In springbok, the mean home range size was 265.41 km^{2} (SE = 76.23 km^{2}) using the high end of the guideline based range, and 401.64 km^{2} (SE = 127.56 km^{2}) using the algorithm (p=0.05). In zebra, the mean home range was 694.43 km^{2} (SE = 80.81 km^{2}) using the guidelines and 1081.29 km^{2} (SE = 162.17 km^{2}) when the algorithm was applied (p=0.01). When the derived metrics were considered, however, the substantial differences in k values did not always result in significantly different duration (Table 3) and visitation rates (Table 4). Though the duration rates in zebra derived from the algorithm were, indeed, significantly higher than those derived using the high value from the range based on the guidelines (p=0.05), this was not the case for springbok (p=0.08). Similarly, the visitation rates emerging from the parameter sets selected by the algorithm were not significantly different from those derived based on the guidelines in either species (p=0.33 in springbok and p=0.15 in zebra).
Conclusion
The results presented here indicate that the effect of selecting parameters using the algorithm rather than the guidelines will be highly contingent upon the focus of the research question. Where home range delineation is the goal, the results are likely to differ significantly (Fig. 2). In the case of epidemiological questions, however, the effects will be somewhat less predictable, and in certain cases, similar conclusions might be drawn irrespective of the approach used for selecting optimal parameters. If an element of the analysis involves comparisons across individuals or species, however, the crossvalidationbased approach provides a unifying framework governed by statistical properties of the home ranges rather than subjective selections by the user.
References
 1
Dougherty ER, Carlson CJ, Blackburn JK, Getz WM. A crossvalidationbased approach for delimiting reliable home range estimates. Mov Ecol. 2017; 5(1):19.
 2
Lyons AJ, Turner WC, Getz WM. Home range plus: a spacetime characterization of movement over real landscapes. Mov Ecol. 2013; 1(1):2.
Acknowledgements
The authors would also like to acknowledge Andy Lyons for creating, maintaining, and improving the TLoCoH package.
Funding
The case study presented here used GPS movement data from zebra and springbok from Etosha National Park, Namibia, which were collected under a grant obtained by WMG (NIH GM083863). In addition, partial funding for this study was provided by NIH 1R01GM11761701 to JKB and WMG. The funders had no role in study design, data collection and analysis, nor manuscript writing.
Availability of data and materials
Please contact Wayne M. Getz (wgetz@berkeley.edu) for data requests.
Author information
Affiliations
Contributions
PDV and ERD developed crossvalidation approach. ERD ran analyses on empirical movement paths. All authors contributed to writing and editing the manuscript.
Corresponding author
Correspondence to Eric R. Dougherty.
Ethics declarations
Ethics approval and consent to participate
All movement data were collected according to the animal handling protocol AUP R2170509B (University of California, Berkeley).
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Additional file
Additional file 1
A new R script for a more efficient gridbased search (Fig. 3) can be found at: https://github.com/doughertyeric/Updated_TLoCoH_Algorithm. As currently parameterized, the gridbased search algorithm covers s values from 0 to 0.05 and k values between 4 and 800. The algorithm searches across the broadest set of k values in intervals of 20 and s values in intervals of 0.01. Upon identifying a peak in the probability surface, the algorithm selects a range of 40 k values around the peak and refines the search there in k value increments of 5. Finally, another range of 10 possible k values is selected and the finest scale gridsearch is conducted in intervals of 1 and s value intervals of 0.001 before selecting the optimal parameter set. (R 11 kb)
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
About this article
Received
Accepted
Published
DOI
Keywords
 Time local convex hulls
 TLoCoH
 Home range
 Visitation
 Duration
 Crossvalidation
 Etosha national park