Inferring an animal's environment through biologging: quantifying the environmental contribution to animal movement

Background: Animals respond to environmental variation by changing their movement in a multifaceted way. Recent advancements in biologging increasingly allow for detailed measurements of the multifaceted nature of movement, from descriptors of animal movement trajectories (e.g., using GPS) to descriptors of body part movements (e.g., using tri-axial accelerometers). Because this multivariate richness of movement data complicates inference on the environmental contribution to animal movement, studies generally use simplied movement descriptors in statistical analyses. However, doing so limits the inference on the environmental contribution to movement, as this requires that the multivariate richness of movement data can be fully considered in an analysis. Methods: We propose a data-driven analytic framework to quantify the environmental contribution to animal movement that can accommodate the multifaceted nature of animal movement. Instead of tting a simplied movement descriptor to a suite of environmental variables, our proposed framework centres on predicting an environmental variable from the full set of multivariate movement data, i.e., the reverse of the route of causal inference. The measure of t of this prediction is taken to be the metric that quanties how much of the environmental variation relates to the multivariate variation in animal movement. We demonstrate the usefulness of this framework through a case study about the contribution of grass availability and time since milking to cow movements using machine learning algorithms. Results: We show that on a one-hour timescale 37% of the variation in grass availability and 33% of time since milking contributed to cow movements. Grass availability contributed mostly to the cows’ neck movement during grazing, while time since milking contributed mostly to the movement through the landscape and the shared variation of accelerometer and GPS data (e.g., activity patterns). Furthermore, this framework proved to be insensitive to spurious correlations between environmental variables in quantifying the contribution to animal movement. Conclusions: Not only is our proposed framework well-suited to study the environmental contribution to animal movement; we argue that it can also be applied in any eld that uses multivariate biologging data, e.g., animal physiology, to study the relationships between animals and their environment. Here we propose a machine learning-based analytic framework to quantify the overall inuence of an environmental variable on multivariate animal movement. Instead of quantifying the environmental dependency of simplied movement descriptors (i.e., the variation in animal movement that is dependent on an environmental variable), as is generally done in movement ecology [2,3], our proposed framework centres on quantifying the environmental contribution to animal movement such that it quanties the degree of variation in an environmental variable that is traceable in animal movement data. After introducing the general framework, we demonstrate the usefulness of this framework with a case study about the contribution of grass availability, time since milking, and wind speed to cow movements. Apart from quantifying the degree of coupling between the environment and cow movements, this case study shows that applying this framework can yield ecological insights. Finally, we discuss possible usages and constraints of this analytic framework. We contend that this framework contributes to the toolbox of ecologists studying the relationship between the environment and animal movement, behaviour, and physiology.

a simpli ed movement descriptor to a suite of environmental variables, our proposed framework centres on predicting an environmental variable from the full set of multivariate movement data, i.e., the reverse of the route of causal inference. The measure of t of this prediction is taken to be the metric that quanti es how much of the environmental variation relates to the multivariate variation in animal movement. We demonstrate the usefulness of this framework through a case study about the contribution of grass availability and time since milking to cow movements using machine learning algorithms.
Results: We show that on a one-hour timescale 37% of the variation in grass availability and 33% of time since milking contributed to cow movements. Grass availability contributed mostly to the cows' neck movement during grazing, while time since milking contributed mostly to the movement through the landscape and the shared variation of accelerometer and GPS data (e.g., activity patterns). Furthermore, this framework proved to be insensitive to spurious correlations between environmental variables in quantifying the contribution to animal movement.
Conclusions: Not only is our proposed framework well-suited to study the environmental contribution to animal movement; we argue that it can also be applied in any eld that uses multivariate biologging data, e.g., animal physiology, to study the relationships between animals and their environment.

Background
Analysing animal movement is fundamental to ecology, because movement is arguably the most important way for animals to respond to their environment [1]. Quantifying the environmental in uence on animal movement is therefore an important practice in ecology [2,3]. As animal movement is inherently multifaceted, with aspects related to the movement of the animal through the landscape and aspects related to the movement of body parts, the movement process cannot be described with simpli ed descriptors without loss of information. On the contrary, a plethora of emergent patterns can be identi ed through these multifaceted movement descriptors, e.g., activity types (such as walking, foraging or resting) and collective movement properties [4,5]. Technological advancements in the eld of biologging currently allow for data on animal movement to be acquired at ner temporal and spatial scales and in increasing volumes, e.g., data on animal movement speed, movement path tortuosity, triaxial acceleration of body parts, and heart rate patterns can now relatively easily be acquired [6][7][8].
These technological advancements provide opportunities to increase ecological understanding by analysing the full multivariate complexity of animal movement [7,8]. This multivariate complexity of movement is not fully used in recent studies to infer the environmental in uence on animal movement.
Instead, quantifying the environmental in uence on animal movement is currently often done through relating simpli ed movement descriptors, e.g., animal distributions, net displacements, diffusion rates, or distributions of step lengths and turning angles, to a suite of environmental variables, e.g., through canonical analyses, linear mixed models, semivariance approaches, diffusion approximations, stepselection functions, hidden Markov models, or state-space models [2,3,[9][10][11][12][13]. Many of these approaches were not designed speci cally for animal movement data, but are approaches that function generally well in quantifying the relationship of independent variables with one or several dependent variables. Although the simpli cation of movement descriptors prior to analyses is a useful practice to acquire ecological understanding, it almost necessarily leads to a reduction in the quanti ed environmental in uence on multivariate animal movement. Even more so considering that there are often multiple behavioural phenotypes that individuals of the same species can produce in a given set of environments [14], which can also be in uenced by different internal states between individuals [1]. This consequently challenges the way that the analysis should be approached, as a multivariate analytic framework is required to quantify the overall in uence of environmental variables on ne-scale multivariate animal movement data.
Data-driven machine learning methods provide a toolset to be able to model multivariate animal movement data [15]. Machine learning methods have been adopted by many animal ecologists over the past years and have been mentioned as one of the most revolutionary emerging technologies to study animal movement [15,16]. These machine learning methods have been used to automatically detect and classify animal species in images [17], to track moving animals in videos [18], to follow animal body postures and track body parts in videos [15], to ag when animals become sick using animal-mounted biologging sensors and videos [19], and to classify animal activities from biologging sensors [16].
Although machine learning has proven to be useful for movement ecology, it is often only used as a tool to transform raw data (e.g., images, videos, accelerometer readings) into informative data (e.g., species labels, animal locations, animal activity labels) [15,16]. After these informative data have been generated, ecologists often use more traditional statistical methods to extract ecological understanding from these data [2,3,7]. However, we posit that the possible applications of machine learning methods extend beyond the ability to transform raw into informative data, and that machine learning can also be used in analyses to acquire ecological understanding.
Here we propose a machine learning-based analytic framework to quantify the overall in uence of an environmental variable on multivariate animal movement. Instead of quantifying the environmental dependency of simpli ed movement descriptors (i.e., the variation in animal movement that is dependent on an environmental variable), as is generally done in movement ecology [2,3], our proposed framework centres on quantifying the environmental contribution to animal movement such that it quanti es the degree of variation in an environmental variable that is traceable in animal movement data. After introducing the general framework, we demonstrate the usefulness of this framework with a case study about the contribution of grass availability, time since milking, and wind speed to cow movements. Apart from quantifying the degree of coupling between the environment and cow movements, this case study shows that applying this framework can yield ecological insights. Finally, we discuss possible usages and constraints of this analytic framework. We contend that this framework contributes to the toolbox of ecologists studying the relationship between the environment and animal movement, behaviour, and physiology.

Methods
Our analytic framework quantifies the contribution of an environmental variable to animal movement by utilizing the multivariate richness of movement data. Instead of building a model to predict a simplified animal movement descriptor from a set of environmental predictors, i.e., the route of causal inference, we turn this around and build a model to predict an environmental variable from a large number of animal movement variables. By using animal movement variables, the model of this framework predicts a perceived environmental variable by the animal [6,20]. Although predicting an environmental variable from movement data is the goal of the model, it is an intermediate step of the framework in order to quantify environmental contribution to animal movement. In this framework it is key to use as many informative movement variables as possible, which could be meaningful human-constructed ecological (e.g., variables related to multiple classified animal activities), mathematical and/or physical variables, or abstract variables from an automated (deep learning) feature extraction algorithm. When effort is made to extract as many informative variables as possible from the animal movement data, chances are maximized that most of the variation of the environmental variable under scrutiny that is present in the data is captured. Furthermore, instead of creating the model as the end product during the analysis, the environmental variable should be predicted on a separate test dataset as the final step of the analysis. This follows from a data-driven and machine learning philosophy, in which complex multivariate models can be built that are not overfitted and therefore generalize better to new datasets. When distinguishing the train and test dataset, the test set used in the prediction of the environmental variable needs to be from a different temporal range than the train set that is used in the model building phase, due to autocorrelation in animal movement data that can otherwise cause the model to overfit [21]. The range of values in the test set of environmental variables (whether or not these are under scrutiny) should be comparable to the range of values in the train set, to prevent incorrect extrapolation. After generating model predictions on the test set, the coefficient of determination (R 2 ) quantifies the fit of this predicted environmental variable from animal movement data to the measured environmental variable on a known scale and can thus be considered a metric on how much of the variation in the environmental variable contributed to animal movement in a multivariate fashion (see Additional file 1) [22]. This is in contrast to approaches where the correlation between a set of environmental variables and one or several simplified animal movement variables are studied [2,3,9], because here the environmental dependency of (instead of the contribution to) movement is often quantified, i.e., the variation in animal movement that is dependent on environmental variables. The measure of fit of the null model (i.e., no environmental contribution) should be chosen depending on the algorithm that is used, which is R 2 = 0 for algorithms that are able to always predict the mean of the response variable (e.g., Support Vector Regression and Random Forest Regression), even when the input variables are white noise. The measure of fit of this null model will then form the baseline value for which there is a 0% environmental contribution and an R 2 of 1 can always be interpreted as 100% environmental contribution. Of course R 2 should only be used as the measure of fit when modelling a continuous environmental response variable. With a discrete environmental variable, a classification approach should be undertaken, which is outside the scope of this study. However, to compare the contribution of different environmental variables with each other fairly, the same measure of fit should be used.
In order to demonstrate the usefulness of the proposed analytic framework, we applied this framework to a case study about the contribution of resource availability (here grass biomass), time since milking, and wind speed to the movement of eight dairy cows in a pasture (Fig. 1). When animals are facing resource depletion, movement characteristics (through the landscape and of body parts), and emergent patterns like group (herd) characteristics, and time allocated to specific activities (e.g., foraging) often change, because animals need to invest more time and/or energy in searching for and acquiring resources [23,24]. Cows in a pasture are a good model for such a case study, because this provides a relatively homogenous foraging arena. Time since milking is another variable that could substantially influence the movement of dairy cows, because it has been shown that the lactation stage of cows (a variable that is intuitively linked to time since milking regarding its effect on cow behaviour) influences the relative distribution of their activity patterns [25,26]. Wind speed provided a good test case for our framework, because it was moderately correlated (r = 0.37) with grass biomass. We expected this correlation to be spurious and the effect of wind speed on cow movement to be negligible, because conditions were mild during the experiment (0-9 m s -1 ).
The exact methodological approach that we describe for this case study is one possible implementation of our proposed analytical framework (Fig. 1). However, there are numerous possible implementations of this framework for other studies, which may be influenced by the problem statement, experimental setup, animal movement sensors, environmental data types, data quantity, etc. However, the property that all implementations should have in common is that environmental contribution to animal movement is quantified by predicting environmental variables from movement descriptors in a data-driven (viz., machine learning) approach, which uses the coefficient of determination as a measure to quantify this contribution.

Data collection
For this case study eight adult female Holstein-Friesian dairy cows were kept in controlled pastures that were small enough so that foraging lead to resource depletion over the course of several days. The experiment ran from 25 April until 11 May 2017. During the experiment, the cows' movements were recorded continuously with e-Track neck collars (Noldus InnovationWorks, Wageningen, Netherlands), containing an EGNOS-augmented GPS receiver and a tri-axial accelerometer sensor. The cows were continuously kept on pasture at Carus animal facility in Wageningen, Netherlands (51°59'8'' N, 5°39'11'' E), and could move freely around as a single group during the experiment. Over the course of this period, we relocated the cows between three 0.32 ha pasture plots (sequentially five, six and six consecutive days in each plot). At every pasture switch the cows were housed inside the Carus facility for one night where they were offered fodder, so that they were not hungry at the start of a new pasture plot session. Furthermore, the cows were taken inside for milking and feeding every morning between 7:30 and 8:30 CEST and solely for milking every afternoon between 16:30 and 17:00 CEST. The time the cows spent on pasture was short enough to assume that the pasture did not increase in grass quality because of regrowth after grazing and only decreased in grass availability [27]. The short duration of the pasture sessions (approximately one day longer than when a commercial farmer would have moved the cows, as judged by the farm manager) ensured that the cows were not hungry, but only had to put more effort into foraging when time progressed. Furthermore, the collaring process did not put the cows under noticeable stress, more so because they were accustomed to continuously wearing a neck collar.
The sensors in the cows' neck collars recorded GPS and accelerometer data during the experiment. The data were stored with a millisecond-accurate timestamp on a local SD memory card, which was replaced every one to five days together with the battery. GPS data were stored on the SD card with a one second interval. The accelerometer data were sampled with a variable frequency of 25-500 Hz, which were later down-sampled and linearly interpolated to a constant 32 Hz signal. Both the GPS and the accelerometer did not record data during some hardcoded multi-hour periods of inactivity, which were variable in duration and time of day, to save battery power. However, the time between GPS fixes was exactly 1 second in more than 99% of the cases. The precision of the GPS fixes was high, with 98% of the fixes having a Horizontal Dilution of Precision (HDOP) of less than two (a dimensionless unit; two is considered "excellent" precision). All GPS fixes with an HDOP of more than five, which were 0.5% of all fixes, were considered to be untrustworthy and filtered out of the final dataset. We also tested the accelerometer data for precision by placing the sensor on a stable, non-moving surface while it recorded for several minutes. The fluctuations in the recorded signal of all three accelerometer axes were small, 0.06 m s -2 between the lowest and highest value, and were considered negligible and thus ignored.
Activity (or behaviour) observations were conducted on work days from 25 April to 9 May 2017. A single person visually classified the activities using focal-animal sampling with a pre-defined ethogram (Table 1). All activity types in the ethogram (grazing, walking, standing, standing while ruminating, lying, lying while ruminating) were mutually exclusive.
Each individual cow was observed continuously for ten minutes in the morning (10:00-13:00 CEST) and ten minutes in the afternoon (13:00-17:00 CEST), in random order, resulting in a total observation time of 1760 minutes. During the observations, the start and end times of each displayed activity type from the ethogram were recorded. We conducted these observations to acquire annotations for an activity classification model. Representative acceleration plots of the three axes for the different activity types are provided (see Additional file 2). We measured resource availability as dry matter grass biomass in kilograms per hectare, excluding stubble biomass. We determined time-varying biomass levels using a combination of field-measured biomass levels at specific time points, satellite-based biomass estimates derived from the Normalized Difference Vegetation Index (NDVI), and modelling of grass dynamics (see Additional file 3). Wind speed (m s -1 , mean speed 10 m above ground) were recorded at 10 minute resolution during the experiments with a weather station on a grass pasture at the Veenkampen, Wageningen, Netherlands. This weather station is located one kilometre west of the pasture plots used for the experiments.

Data processing
We used the pre-processed 32 Hz, tri-axial accelerometer signal as input for the accelerometer feature extraction. First, we converted all the records in the threedimensional accelerometer dataset to 21 dimensions using multiple geometric transformations, i.e., resultant vectors, angles, solid angles, volumes and areas (Table 2).
These dimensions constitute all geometric transformations of angles and distances in one, two and three dimensions. Considering that tri-axial accelerometer readings describe the movement forces in three dimensions, geometric transformations make sense from a physics perspective. More transformations could be considered, but these may lack to provide additional information to the feature set. Second, we divided the resulting dataset into non-overlapping time windows. We tried all window sizes in the range of 1 until 30 seconds and optimized this window size as a hyperparameter regarding the activity classification performance, where 3 s turned out to be the optimal window size (see Additional file 5). For every time window we computed multiple statistics per accelerometer dimension per cow, e.g., mean, standard deviation, quantiles and Fast Discrete Fourier Transform (FFT) parameters (Fig. 2). These statistics were chosen to provide summary statistics about both the time-invariant and sequential aspects of the data, given that accelerometer data also includes patterns in the frequency domain regarding animal activity (e.g., head movement of cows during grazing has a strong cyclic behaviour). We computed the FFT with the base R stats package [28], of which we used the maximum FFT value as the dominant amplitude, the corresponding period of the dominant amplitude as the dominant period, and finally the sum of all squared FFT values as the spectral energy.
Our list of computed statistics is not all-encompassing and more statistics can be thought of to describe patterns in the data. However, as these statistics were mainly used in the activity classification part of the analysis, we deemed the computed statistics sufficient when it resulted in a high performance during activity classification. Overall, computing all statistics for each dimension resulted in 210 accelerometer features per time window per cow.
[Please see the supplementary les section to view Table 2

.]
We used the filtered 1 Hz GPS data as input for the GPS feature extraction. First, we transformed all the latitude, longitude coordinates to Cartesian coordinates by projecting them to zone 31N of the UTM system (EPSG 32631). Second, we extracted a number of individual GPS features from the projected GPS coordinates per time window per cow, related to speed, turning angle, tangential velocity, mean squared displacement, and first passage time ( Table 3). The time windows were exactly the same as the time windows used in the extraction of the accelerometer features. Third, we extracted a number of group GPS features from the projected GPS data per time window per cow, related to group shape, group area, and distances and directions to other cows (Table 4). We determined which individual and group GPS features to compute by drawing fake GPS trajectories and animal clusters, after which we discussed which geometrical properties (e.g., tangential velocity:    (Fig. 2, Table 3), which we first converted to principal components. We linked the time-matched activity observations to these input data and used the activity type as output variable for the classification models. We trained a multi-class classification model for the activity types: grazing, walking, standing and lying down. As a second step after the main activity classification we also trained a binary classification model for ruminating, with an extra input variable that indicated standing versus lying down. We tried for both classification models a Support Vector Machine (SVM) with a Radial Basis Function (RBF) kernel and a one-against-one approach, implemented in the e1071 package for R 3.6.2 [28,29], and a Random Forest (RF) with 500 trees, implemented in the randomForest package [30]. To prevent overfitting due to autocorrelation in the data we randomly assigned each hour of the dataset into a train (80%) or test set (20%) and performed 5-fold cross-validation on the train set, which was also split per hour at each of the 5 crossvalidation iterations [21]. To find the optimal hyperparameters for the models (number of principal components and time window size for both SVM and RF; cost, gamma and class weights for SVM; and mtry, sample size and node size for RF), we used an extensive grid search on a High Performance Cluster of Wageningen University, Netherlands (see Additional file 5). We started the grid search with a coarse resolution search that covered a large range of all hyperparameters, to make sure that the global optimum was covered and to get a feel for the performance landscape. We zoomed in with a finer resolution during a second grid search and finished with an even more zoomed in and finer resolution during a final grid search. We determined the optimal classification model and hyperparameters by selecting for the highest mean balanced accuracy during cross-validation (Equation 2). The classification models with the highest performance during cross-validation were then evaluated for performance on the test dataset. Finally, we used the models to predict the displayed activity type (grazing, walking, standing or lying down) and whether or not the cows were ruminating, for all the time windows and cows with available sensor data.
We computed the dataset for the environmental variable predictions per cow over one-hour time windows. The window size that is chosen has of course an influence on the results, as the effect of an environmental variable on animal movement data varies with temporal scales [31]. In short, the window size that is chosen represents the scale at which the animals' behavioural decisions are made [31].  (Table 5). We did not consider variables conditional on other activity types than foraging, because the cows sometimes did not display one of the other activity types during a one hour time window.
This resulted in a total of 548 variables per cow per one-hour time window. We standardized these variables (to zero mean and unit variance) per combination of day/night and cow ID to account for differences in nocturnal and diurnal activities of cows and individual differences in movement characteristics, group characteristics, and activities.
These standardized variables were used as input for a principal component analysis, but were first one by one visually checked for symmetric unimodality by inspecting the histograms and normal Q-Q plots. Two of the 548 variables displayed signs of bimodality and eight variables appeared to be somewhat heavy-tailed. Due to the low number of variables that showed these deviations and due to the small severity of these deviations, we decided not to correct these ten We used the second pasture plot session as our test set, because its range of biomass values fell within the range of biomass values of the first and third pasture plot session.

Results
The general cline of grass biomass is predicted by both models, but the steepness is not entirely captured (Fig. 3). The time since milking cline is quite accurately matched from 0.5 to 6.5 hours, but after 6.5 hours it levels off for both models (Fig. 3). For wind speed both models were not able to make accurate predictions (Fig. 3). Overall the SVR models outperformed RFR in predicting the environmental variables from cow movement data (see Additional file 4). When analysing the explained variation of the models with only accelerometer or GPS datasets, the qualitative differences between the explained variation of the different response variables for both algorithms are comparable (Fig. 4). However, SVRs are apparently better capable of using the interaction between variables in the mixedsensor dataset to increase the explained variation, while RFRs are hardly able to do so with our data (Fig. 4 and 5). Both models indicate that grass biomass contributes substantially more to accelerometer data than to GPS data, while the reverse is true for time since milking ( Fig. 4 and 5). Furthermore, for time since milking the explained variation by accelerometer data is largely shared with GPS data (Fig. 5). Finally, the optimization of the hyperparameters was also done on datasets of each cow separately, which resulted into approximately the same hyperparameters and performance when compared to the model for all cows combined. Therefore, we concluded that cows responded to changes in resource availability and time since milking in approximately the same manner and we thus decided to use the models for all cows combined.
Some of the variables used in our model were based on the automated activity classifications of the cows' sensor data and visual observations. SVMs outperformed RFs for all these activity classification tasks with our data, so we only used the predictions of the SVMs. The best performing SVM classification model of the main activity types achieved 91.7% mean balanced accuracy on the test set and the best performing SVM model of rumination 90.9% (see Additional file 5). While we maximized the mean balanced accuracy during cross-validation, also kappa, Matthews Correlation Coefficient, mean F 1 and mean True Skill Statistic were maximized at the same time (Table 6). Moreover, the confusion matrices of both models show that, in addition to a high accuracy, the relative frequency of misclassification of each activity type was approximately equal (see Additional file 6). This means that the models were not overclassifying a specific activity type over another. Furthermore, we have found no substantial inter-or intra-cow activity classification performance differences. We thus considered the SVM activity classification models good enough to reliable predict the activity types based on the movement sensor data, even more so because the classification performance was higher or comparable to other cow activity classification studies [4,32,33].

Discussion
In the case study we quanti ed that on a one-hour resolution resource availability contributed for 37% to cow movements (consisting of movement through the landscape, body part movement, and emergent patterns like group characteristics, and displayed activities) and time since milking contributed for 33%, while wind speed did not contribute noticeably (Fig. 3-5). These results support our expectations that both resource availability and time since milking are important in shaping the movement of cows, but that wind speed (during relatively mild conditions) is not. Furthermore, it seems that the moderate correlation between resource availability and wind speed was indeed spurious. This framework proved to be insensitive to this spurious correlation, as it did quantify the contribution of wind speed to cow movement to be 0%. Furthermore, the Support Vector Regression (SVR) models performed overall better than the Random Forest Regression (RFR), especially when confronted with a dataset with both GPS and accelerometer variables, but the qualitative patterns when comparing the three different environmental contributions to single-sensor movement datasets were the same for both algorithms. Due to the SVRs higher performance, we do consider it to be the better alternative over RFR for this analytical framework when dealing with hyperdimensional datasets, especially when variables from multiple sensors are mixed. Moreover, we found that resource availability contributed more to accelerometer variables (29%) than to GPS variables (8%), but this contribution to GPS variables still was largely independent from accelerometer variables (less than 1% of the total variation was shared). This indicates that, at this temporal scale and with these computed movement variables, the individual movement of cows through the landscape and the spatial group characteristics hardly contained any signature of resource availability and that almost all of the contribution of resource availability to cow movements became apparent from the accelerometer variables of the cows' neck during grazing. The accelerometer variables of the cows' neck during grazing, being descriptive for bite frequency and bite force ( Table 2, Fig. 2 and Table 5), probably link more explicitly to grazing behaviour than GPS variables do. These accelerometer variables are probably in uenced more by resource availability than GPS variables, because grazing behaviour in cows is closely linked to resource availability [35]. The opposite was found for time since milking, which contributed more to GPS variables (29%) than to accelerometer variables (21%), with a lot of their explained variation being shared (17% of the total variation). This links well to our previous argument about that the accelerometer variables are shaped for a large part by the cows' neck movement during grazing, which is intuitively more heavily in uenced by grass biomass than by time since milking. Previous studies also found that the lactation stage, a variable that we expected to be linked to time since milking regarding its effect on cow behaviour, in uences the relative distribution of cow activity patterns and cow movement through the landscape [25,26]. This supports our nding about a higher contribution to GPS variables with a large shared contribution with accelerometer variables, because the movement through the landscape is measured by GPS variables and the activity patterns are measured by both GPS and accelerometer variables. Finally, the estimated model parameters were similar for all cows, indicating that the cows responded to changes in resource availability and time since milking in the same way.
However, it should be noted that all the results that are presented above are of course context dependent. With a different experimental setup, e.g., indoor instead of pasture housing or different ranges of environmental variable values, the quanti ed contributions can change. As is the case with nearly all modelling efforts, this framework is also only able to provide sensible results about the system for which data is available.
Our case study illustrates how the proposed analytic framework can quantify the contribution of an ecological variable to animal movement. Having this quanti cation as the goal of the analytic framework, human interpretation and understanding of the correlative relationships within the model is initially of lesser importance. The goal of the framework is to build a model that can explain as much of the variation of the measured environmental variable as possible, without doing concessions to the model's complexity to facilitate human interpretation. Only then the overall contribution of the environmental variable to animal movement is under scrutiny to be quanti ed. This analysis could be followed by a stage where the researcher is selective in the choice of movement variables, to study which movement variables are mainly in uenced by the environmental variable. Due to the way the framework is set up, the environmental contribution to multivariate animal movement will by de nition always be higher or equal to the environmental contribution to a subset of the animal movement variables. Thus, using this framework to rst determine the environmental contribution to multivariate animal movement and afterwards determine the contribution to speci c subsets of movement variables, allows for an analysis that shows in which movement variables the environmental contribution is most or least visible. This is demonstrated in our case study, where resource availability mainly contributed to accelerometer variables and much less to GPS variables, indicating that resource availability was more tightly linked to the cows' movement of body parts than to their movement through the landscape. The opposite was true for time since milking, where also the explained variation by the accelerometer data was largely shared with GPS data. Furthermore, this framework allows for a comparison between the contributions of multiple environmental variables to animal movement whilst being insensitive to moderate spurious correlations between environmental variables, which is also shown in our case study with regards to the contribution of wind speed. Therefore, this framework could be well suited for exploratory analyses of the link between environment and animal movement. However, it should be noted that the environmental contribution to animal movement (i.e., the variation in an environmental variable that is traceable in animal movement data) is not the same as the environmental dependency of animal movement (i.e., the variation in animal movement that is dependent on an environmental variable), where potentially the environmental contribution can be large but the dependency small or vice versa. To accommodate for a multivariate analysis of animal movement we determine environmental contribution instead of using the route of causal inference. In movement ecology usually the environmental dependency of animal movement is the focus of analyses, as this allows for the determination of the direction and strength of the environmental in uence on an animal movement variable. Therefore, post hoc analyses that link environmental variables to a simpli ed animal movement descriptor can supplement our proposed multivariate analytic framework in order to study the route of causal inference [2,3].
Various factors in the relationship between the environment and animal movement in uence the quanti cation of the environmental contribution to animal movement (Fig. 6). First, many environmental variables are correlated and interact with each other in their in uence on the animal's decision making and, thus, movement [1]. When the contribution of a single environmental variable to animal movement is under scrutiny, these correlations and interactions with other environmental variables need to be taken into consideration. In the proposed analytic framework we do not distinguish between the independent, shared, and interaction contributions environmental variables to animal movement [9], which is different from the independent and shared contribution to multiple subsets of the movement variables as described in our case study. As a consequence, both the direct and indirect contributions of an environmental variable to animal movement are combined into a single metric. Future research could potentially be aimed at the distinction between these contribution types of multiple environmental variables on multivariate animal movement, e.g. by using multi-target (Support Vector) regression and variation partitioning procedures [36,37]. Furthermore, when the contribution of an environmental variable to animal movement is quanti ed, it is important that the movement itself does not in uence the environmental variable directly at that point in space and time as well. Social proximity is for example an important variable in the shaping of individual animal movement, but individual movement parameters also directly shape collective movement patterns [38]. The t of a model with social proximity as response variable and individual movement variables as input data would then not be solely the contribution of an environmental variable anymore. This could consequently yield unrealistically large values of the explained variance, which should be prevented.
In the relationship between the environment and animal movement, the animal's internal state ("why move?"), motion capacity ("how to move?"), and navigation capacity ("where to move?") are also involved [1]. The animal's internal state is composed of many different factors, e.g., physiological "need" (hunger, fear, etc.), physical characteristics (age, sex, body condition, etc.), and personality differences (laziness, level of sociality, etc.), that combined result in a certain response by the animal when confronted with a set of environmental variables at certain moment in time [1]. We translate this combined net effect of the internal state factors into the willingness of the animal to respond to the environment (Fig. 6). The motion and navigation capacity can be translated into the ability of the animal to respond. Another factor that is involved, even before the animal can decide whether it is willing and able to respond, is the animal's perception of the environment [20]. Only when an animal can observe changes or differences in an environmental variable can it decide to respond in a certain way. Because of the aforementioned latent variables -perception, willingness, and ability -the movement of the animal is not purely a deterministic function of a xed set of environmental variables [1]. These latent variables can thus cause a partial environmental contribution to animal movement. Furthermore, these latent variables are in part individual-speci c [1], which is why differences between individuals should be taken into consideration by standardizing the movement variables per individual and/or adding individual identi ers as variables to the model.
Other factors, which are more data-related, also in uence the quanti cation of the environmental contribution to animal movement (Fig. 6). First, environment and animal movement are linked through sensor measurements, which in uence the outcome of the analysis through varying sensor types, resolution, extent, and precision. Second, the movement variables that are computed from the animal movement data to describe the movement process determine how much of the environmental contribution to animal movement is traceable in the data. Therefore it is key to extract as many informative movement variables from the animal movement data as possible in this proposed framework (or optimize the architecture of a neural network in a deep learning approach), because ideally all inherent variation needs to be extracted from the movement data to quantify the total environmental contribution and to compare the contribution of different environmental variables fairly. In our case study, the best performing models had a selected number of principal components with a relatively low cumulative proportion of variance, especially for the GPS variables (see Additional le 4), which suggests that enough variation had been extracted from the raw data to make a good prediction about the environmental contribution to animal movement. Although the best performing model does not necessarily equate a good model, so it could theoretically also be that we missed to extract some extra informative variables from the raw data, which could otherwise have resulted in an even better performing model. Third, the temporal scale at which these variables are computed determine the temporal scale for which the contribution of the environmental variable to animal movement is quanti ed. As the effect of an environmental variable on animal movement data varies with temporal scales, the choice of the temporal scale of the variables is relevant [31]. Finally, the algorithm that is used to predict an environmental variable from animal movement data in uences the level of t that can be attained, which is demonstrated in our case study with SVR outperforming RFR on all occasions.
Algorithms that can model complex interactions between variables are often able to make better predictions of the response variable, e.g. RFR, SVR, and Neural Network Regression, likewise are algorithms that take into account the sequence of time series data, e.g. Recurrent Neural Network.
Quantitative comparisons between the contributions of different environmental variables to animal movement can thus only be done reliably when the same algorithm is used on the same underlying animal movement dataset.
Apart from only using the R 2 of the model predictions to acquire ecological insights, the patterns of the observed vs. predicted plots can also potentially generate insight. For an environmental variable to in uence animal movement, the animal's perception, willingness, and ability are conditionalities (Fig. 6).
Therefore, certain parts of the environmental variable's range might be better predicted by the model than other parts. It could be argued that this could be an explanation for the better SVR predictions during intermediate grass biomass compared to low and high biomass levels, thereby creating a lower overall slope of the predictions compared to the observations (Fig. 3). However, apart from animal perception, willingness, and ability, other factors might also in uence patterns of the observed vs. predicted plot (Fig.   6). In this case the algorithm might be the underlying cause for the lower overall slope of the SVR biomass predictions, due to a "regression toward the mean" characteristic (see Additional le 7).
Furthermore, the overall gradient of the time since milking predictions follows the measurements quite accurately for both models from 0.5 to 6.5 hours, but after 6.5 hours it levels off (Fig. 3). This suggests that until 6.5 hours cows continue to change their movement in response to the time since they were last milked, but after 6.5 hours there is no noticeable change in movement anymore. Besides a potential behavioural ecological cause for this pattern, it could also be (partially) caused by correlations with other time variables due to our experimental setup where the cows were milked two times a day around the same time of day. Follow-up studies could focus on these predicted time since milking patterns, where the experimental setup should contain multiple groups of cows that are milked at different times of the day. Finally, apart from concluding that wind speed probably has no noticeable effect on cow movement in this study (Fig. 3), it becomes clear that the model performance suffered from some higher wind speed values in the test set compared to the train set (thereby generating an R 2 lower than 0).

Conclusions
We developed an analytical framework that can quantify the environmental contribution to animal movement while preserving the multifaceted nature of the movement process. The most important aspect of this framework is that it quanti es the degree of variation in an environmental variable that contributes to multivariate animal movement, compared to prevailing methods that quantify how much of the variation of a simpli ed animal movement descriptor is dependent on environmental variables [2,3]. Apart from providing a measure of the tightness of coupling between an environmental variable and animal movement, the prediction of an environmental variable from animal movement data can be a useful application in itself as the unique property of this predicted variable is that it represents the perceived environmental variable by the animals. This framework demonstrates that the possible applications of machine learning methods extend beyond the ability to transform raw into informative data, and that machine learning can also be used in analyses to acquire ecological understanding.
The applicability of our multivariate analytic framework extends beyond animal movement. With the recent increase in biologging practices, more and more variables of animal data are acquired [6][7][8].
These data do not only encompass animal movement, but for example also animal physiology, which can be related to environmental variables as well using the same framework as presented in this study [6][7][8], e.g., by relating heart rate patterns to terrain characteristics or physical tness metrics to climate conditions. Apart from using this analytic framework to quantify environmental contribution to animal biologging data, the computation of perceived environmental variables can allow researchers and managers to monitor the perceived habitat of animal species [6,20]. This way, the habitat quality in natural areas, e.g., in terms of resources, can be assessed more precisely regarding the needs of speci c, sensor-equipped, focal animals [6,39]. Furthermore, with the results presented here, the management of pasture-fed cattle can be optimized by detecting the appropriate time to move cattle to a more resourcerich area or towards a milking machine, without measuring resource availability or milk content in the udder directly. Finally, we argue that our proposed data-driven analytic framework to quantify environmental contribution to animal biologging data is a valuable tool for explorative and comparative analyses on the relationship between the environment and animal movement, behaviour, and physiology. Figure 1 Flow chart of the summarized methodological approach for the case study.  Explained grass biomass and time since milking variation using Support Vector Regression models (SVR)

Figures
and Random Forest Regression Models (RFR) with a GPS, accelerometer (ACC) and combined dataset.

Figure 5
Variation partitioning of accelerometer (ACC) and GPS data with Support Vector Regression models (SVR) and Random Forest Regression models (RFR) for grass biomass and time since milking.