Limitations of using surrogates for behaviour classification of accelerometer data: refining methods using random forest models in Caprids
Movement Ecology volume 9, Article number: 28 (2021)
Animal-attached devices can be used on cryptic species to measure their movement and behaviour, enabling unprecedented insights into fundamental aspects of animal ecology and behaviour. However, direct observations of subjects are often still necessary to translate biologging data accurately into meaningful behaviours. As many elusive species cannot easily be observed in the wild, captive or domestic surrogates are typically used to calibrate data from devices. However, the utility of this approach remains equivocal.
Here, we assess the validity of using captive conspecifics, and phylogenetically-similar domesticated counterparts (surrogate species) for calibrating behaviour classification. Tri-axial accelerometers and tri-axial magnetometers were used with behavioural observations to build random forest models to predict the behaviours. We applied these methods using captive Alpine ibex (Capra ibex) and a domestic counterpart, pygmy goats (Capra aegagrus hircus), to predict the behaviour including terrain slope for locomotion behaviours of captive Alpine ibex.
Behavioural classification of captive Alpine ibex and domestic pygmy goats was highly accurate (> 98%). Model performance was reduced when using data split per individual, i.e., classifying behaviour of individuals not used to train models (mean ± sd = 56.1 ± 11%). Behavioural classifications using domestic counterparts, i.e., pygmy goat observations to predict ibex behaviour, however, were not sufficient to predict all behaviours of a phylogenetically similar species accurately (> 55%).
We demonstrate methods to refine the use of random forest models to classify behaviours of both captive and free-living animal species. We suggest there are two main reasons for reduced accuracy when using a domestic counterpart to predict the behaviour of a wild species in captivity; domestication leading to morphological differences and the terrain of the environment in which the animals were observed. We also identify limitations when behaviour is predicted in individuals that are not used to train models. Our results demonstrate that biologging device calibration needs to be conducted using: (i) with similar conspecifics, and (ii) in an area where they can perform behaviours on terrain that reflects that of species in the wild.
Biologging has transformed what we know about wild animal behaviour [1,2,3], with particular value attributed to tri-axial body acceleration [4,5,6]. Biologging devices enable researchers to gain detailed insights into the movement and behaviour of animals [7, 8]. Specifically, where data are limited by direct observations  or telemetry is constrained (e.g. sampling intervals are low , location is inaccurate [11, 12]), these devices record body movement of animals at high frequencies. They can thus provide detailed information on the study subjects, representing a powerful opportunity to study enigmatic species .
Accelerometry data are generally collected at high frequencies (typically tens of hertz), generating large datasets. However, the ease with which these data can be collected is in stark contrast to the difficulties in analysing and interpreting such large data sets (e.g. 40 Hz sampling frequency gives nearly 3.5 million data points per day for a single channel) [13, 14]. Various computational approaches can be used to analyse these data for behavioural identification, including machine-learning algorithms such as k-nearest neighbour , random forest models , gradient-boosting machines , support vectors machines and artificial neural networks [4, 17]. Random forest models are a commonly used approach for classification of behaviours from accelerometry data and provide high accuracy [4, 18].
Whilst the high recording frequencies of the devices are key to identifying behaviours accurately, the use of lower recording frequencies can extend deployment time and reduce associated computational time [18, 19]. The optimisation of sampling frequencies, which will vary with study subject and aims, is therefore an important issue. This is amplified for devices recording parameters other than just acceleration, such as tri-axial magnetometry and barometric pressure , which may also be important keys to identifying behaviours [20, 21]. Even when using accelerometry alone, a large number of variables can be computed to include in models for behaviour classification (e.g. 25 variables ). Thus, it is important to consider the biological and mechanistic relevance of all variables included in behavioural classification.
Despite the potential of computational approaches to help automate behavioural classification, direct visual observation of the study individuals remains important for the development of accurate algorithms . To overcome the difficulties of observing elusive wild animals, it has been suggested that captive conspecifics can be used to identify behaviours . Indeed, this technique has been shown to have value for measuring behaviour in a range of species [5, 22,23,24], and where captive individuals are not available, domestic counterparts have been suggested as a viable proxy . However, individual variation , including differences in morphology and body-size  and the effect of variation in free-living animal habitat compared to domestic and captive settings [22, 27], may be critical when applying such methods. Importantly, it is particularly problematic to test the value of domestic surrogates for wild animals if those wild animals cannot be observed for verification. For example, applying the common method for splitting data into training and validation data sets overestimates the accuracy of models when tested on new individuals because the models are validated on individuals also used to train the model .
While it is well acknowledged that differential environment use is an important part of the behavioural ecology of free-living animals , it is less appreciated that terrain substrate, superstrate (defined as any material an animal must push against to move ), and gradient, affect accelerometer signals and, thereby, the ability to derive behaviours from accelerometry data . For example, the gradient of a terrain should be identifiable in tetrapods because the static acceleration, indicating animal orientation, will change accordingly  and animals may, in any event, change gait, stride length and speed according to terrain slope [32, 33], all of which can be manifest in a tri-axial accelerometer signal.
The Alpine ibex (Capra ibex) is a Caprid that lives at high altitudes in the central European Alps  in populations that are highly fragmented due to pressure from land-use change, agriculture, human disturbance and climate change . Climate change is considered to be particularly important since this species is sensitive to heat and avoids heat stress, which reduces the quality of the food resources they can access [36, 37]. Given on-going global warming, there is concern that physiological and behavioural constraints on the Alpine ibex will lead to severe declines of the species following rapid truncation of suitable habitat . Research is needed to understand the species capacity to adapt to changing environmental conditions, and animal-attached logging systems are ideal for this purpose. However, the high-altitude habitat of the ibex makes it implausible to observe the species in the wild to validate accelerometer signals for behaviour, so it is appropriate to consider using captive surrogates for this. Captive populations of the Alpine ibex are few and access is limited, so a pragmatic approach would be to attempt to calibrate behavioural data using a similar but tractable and accessible species such as the domestic pygmy goat (Capra aegagrus hircus), which is phylogenetically similar and readily available in domestic settings .
In this study, we tested the validity of this approach by using loggers that measure tri-axial acceleration and magnetic compass heading, on both captive pygmy goats and captive Alpine ibex to examine behaviours of both species using a random forest model approach. We hypothesized that observations of pygmy goat behaviours could be used to predict the behaviours of captive Alpine ibex thereby demonstrating that domestic surrogates can serve as suitable proxies for helping resolve behaviour based on acceleration in rare or difficult-to-handle wild species of conservation concern. We additionally provide a widely applicable template for refining the use of random forest models to predict behaviours including; feature selection approaches, the addition of tri-axial magnetometry variables, selecting the optimum sampling frequency, handling unbalanced observations and data splitting method (random vs individual). With these models, we then aimed to provide behavioural templates for both Alpine ibex and pygmy goats, including predicting the terrain slope for locomotion behaviours. Finally, we examine the ability of our models from one species to predict behaviour in the other in order to assess the value of using surrogate species when captive populations of the focal species are not available for study.
Study subjects and enclosure
The study was conducted using collar-attached ‘Daily Diary’ tags (Wildbyte Technologies Ltd., Swansea, UK ;) deployed on African pygmy goats at Belfast Zoo (Northern Ireland, UK) in November 2017 and May 2018, and captive Alpine ibex at Kolmården Wildlife Park (Norrköping, Sweden) in November 2018 and November 2019 (Additional file 1 Table S1). At Belfast Zoo, ‘Daily Diary’ tags were deployed on nine female pygmy goats (mean body weight = 25.9 kg, age range = 3–10 years) for periods of 5 days over 1 month within each of two enclosures. Keepers were able to handle the goats to deploy collars. The first enclosure consisted of a sloping grass paddock (slope gradient = 18%, area = 2210 m2 [50.1 × 35.3 m]) surrounded by hedges, and the second enclosure was a flat smaller concrete yard with an area of wood mulch (area = 163 m2 [16.6 × 7.3 m]).
At Kolmården Wildlife Park, in November 2018, collar-attached devices were deployed on two male Alpine ibex (weight not known, age = 9 years) following a protocol in which the animals were trained though positive reinforcement (using feed pellets as a reward) to wear collars without the need for anaesthesia. Stations to protect the zoo personnel were constructed from wood and both individuals were trained incrementally, over a period of 2 months (Additional file 1 Table S2, pers comm Pieter Giljam, Zoospenseful and Kolmården Wildlife Park). Collars were deployed on male Alpine ibex for two periods of 5 days over a month.
In November 2019, collar-attached devices were also deployed on four female Alpine ibex (mean body weight = 45.6 kg, age range = 5–13 years) for a period of 15 days. Female ibex were not compliant to training. Therefore, each individual was sedated using an intramuscular injection of butorphanol (0.009 mg/kg), Etorphine (0.009 mg/kg) and Xylazine (0.674 mg/kg). The collar was deployed, and subject body mass, limb length and horn length recorded. To reverse the anaesthesia, individuals were given an intramuscular injection of naltrexone (0.674 mg/kg) and atipamezole (0.112 mg/kg). Sedation was repeated at the end of the data collection period (after 15 days) to remove the collars. Procedures were conducted by the Kolmården veterinarians. The enclosure was a large area (18,342 m2 [202.4 × 80.4 m]) consisting of a mixture of grass and rock surfaces with multiple slopes (range of slopes = 1.7–87%).
Tri-axial acceleration was recorded at a frequency of 40 Hz as well as tri-axial magnetometry, temperature, pressure, time and date. Devices were encased in a plastic housing with a 3.6 V battery (LS 14250, Saft, France; 147 mm × 25 mm; 9 g) and sealed with tesa tape (Tesa® tape 4651, Tesa, Germany). Devices were then attached to the collar using tesa tape and collars were weighted either side of the device to ensure it remained in position on the ventral side of the animal (weight = 135–235 g; dependant on the collar size). Collar weight was within 0.8% of individual body weight and collars were fitted to have a circumference that was 5 cm larger than that of the neck . All devices were oriented so the z-axis corresponded to ‘heave’ (up-down motion), x-axis to ‘surge’ (forward-back motion) and y-axis to ‘sway’ (left-right motion) (Fig. 1). Before deployment, each device was calibrated to the exact time, orientation of the axes and to correct accelerometer and magnetometer offsets.
Observation and processing of data
To classify behaviour, observations were conducted using a video camera (Canon PowerShot SX720 HS; Canon Inc., Japan). Nine behaviours were distinguished for each species (Table 1) and were recorded for an average of 125.9 min (range: Pygmy goats = 1–221.6 min, Alpine ibex = 2.7–145.2 min). The slope of terrain for locomotion behaviour was also recorded as flat (− 2.5° to 2.5°), uphill (> 2.5°) or downhill (< − 2.5°: Table 1). Individuals were observed from outside their enclosure. Pygmy goats were recorded for a total of 654 min (mean ± sd = 73.5 ± 25.3 min per individual) and Alpine ibex were observed for a total of 516 min (mean ± sd = 87.0 ± 14.4 min per individual) (see Additional file 1 Table S3). Acceleration data were manually labelled according to the observed behaviour for the duration of the observation period using ‘Daily Diary Multiple Trace’ software (Wildbyte Technologies Ltd., Swansea, UK). Only data with labelled behaviour observations were included in the analysis.
Accelerometry and magnetometry variables
To classify specific behaviours, 39 variables that are commonly used to detect behaviours from data [1, 5, 21, 26] were extracted or derived from the raw tri-axial acceleration and magnetometry data (Additional file 2 Table S3). From tri-axial acceleration, these variables were either based on static acceleration (cf. Shepard et al. ), which describes the orientation of the device relative to gravity and thus the posture of the animal, or dynamic acceleration, which describes the body movement of the animal . From the tri-axial magnetometry, five variables were included, calculated using each of the three orthogonal axes independently or by combining all three axes to provide a measurement of full body motion [20, 21] (Additional file 2 Table S3).
Building random forest models
Random forest models, which are an extension of classification (decision) trees and are robust and powerful for this type of analysis , were built to predict behaviour for both the pygmy goat and Alpine ibex data separately, using accelerometry and magnetometry variables (see above). All analyses were conducted in R version 3.9  using the package randomForest . Random forest models use classification trees to classify the observations into different behaviours by building a hierarchy of decision rules based on the variables selected [5, 42]. Our random forest model used 500 iterations (the number of classification trees sampled), and a random subset of data was used to build each tree (bootstrapping) to enable a robust model which limits overfitting and problems associated with unbalanced datasets, which may be common in observations of animals that are likely to spend more time resting than active [5, 26], although unbalanced observations may lead to bias towards dominant observations classes . If an observation is randomly selected, the Gini index measured the probability of it being classified incorrectly. At each classification node, observations were continuously subdivided until the Gini index did not decrease [5, 26]. The mean Gini decrease gave the importance of each variable in classifying the behaviours, with higher values indicating higher importance. The proportionate error of each model (number of misclassifications/number of observations according to the number of trees) was checked for each behaviour and the ‘out-of-bag’ error estimates (observations not included in the bootstrapped sample or tree) examined for each model to evaluate model performance (Additional file 2 Fig. S4).
Models were built with data subsampled at different sampling frequencies to check the effect on classification accuracy of behaviours; 40, 20, 10, 5 and 1 Hz . Random forest models need variables that are not correlated and contribute to the power of the model [45, 46]. To remove correlated features, accelerometry and magnetometry variables were tested for correlation using the Caret package . Correlated variables (Pearson’s r ≥ 0.70) that were the least important according the mean Gini decrease were excluded. Although a consensus does not yet exist on the best methods for random forest model simplification or variable reduction in ecology , we removed redundant features using recursive feature elimination (RFE) which fits the random forest models using cross-validation and selects the features to be retained in the model. Variable reduction was conducted consistently for both species models to ensure models used the same variables. The importance of including magnetometry variables was tested separately by removing them from the model and comparing the output for each model using model performance metrics. A general linear model was used to test the effect of sampling frequency and magnetometry variable inclusion on classification accuracy. Model accuracy was included as the response variable and sampling frequency, species and data (accelerometry or accelerometry and magnetometry) included as explanatory variables.
The following steps were conducted with data at the lowest sampling frequency that resulted in a high classification accuracy, bearing in mind that unbalanced datasets may bias the predictive ability of classification methods toward the most dominant data classes  and that standing, eating, browsing, walking and resting had a higher number of observations than other behaviours (see Table 1). We used a down-sampling strategy to handle imbalanced data classes for relevant behaviours to remove instances in the majority classes. Specifically, behaviour classes that were observed for longer than the median (560.4 s) were down-sampled randomly using the Caret package . Another strategy that may improve model performance is reducing the number of behaviour categories. The initial models included all behaviours observed in each species, and the effect of reducing the number of behaviours was tested by removing those assumed to be less relevant to ethological studies: aggression, grooming, and shaking.
Authors using random forest models to predict behaviour from accelerometry generally split data randomly into 60% training and 40% validation sets (e.g. [5, 26]). However, the value of using data split per individual datasets has been highlighted when validating the ability of models to predict behaviour of unobserved individuals . In this study, we built two model sets, the first splitting the data 60/40 randomly, with data from each individual present in both the training and the validation models, and the other approximately split 60/40 at the individual level, with individuals only in either the training or validation sets. The individual-split models were repeated for all combinations of individuals in the training or validation data sets using a k-fold cross-validation strategy to give average model performance  (Table 1). The effect of balancing observations, and reduced number of behaviour classes on the model performance metrics was tested for both the random and individual-split models using one-way ANOVAS and Tukey pairwise-comparisons for each species.
Random forest model validation
To estimate model performance for each random forest model used in this study, confusion matrices were produced for the model on the validation dataset, highlighting true positives, false positives and false negatives [5, 27]. From these, the model accuracy, precision and recall were calculated using the number of true positives (TP, correctly classified positive behaviours), false positives (FP, incorrectly classified positive behaviours), true negatives (TN, correctly classified negative behaviours), false negatives (FN, incorrectly classified negative behaviours). Model accuracy was calculated as the percentage of true positives and true negatives :
Precision was defined as the proportion of positive classifications that were true compared to false positives:
Recall was defined as the proportion of positive classifications that were true compared to the false negatives :
The F1 statistic was then calculated as the harmonic mean of Precision and Recall used as a metric of the overall performance for classification of each behaviour :
Predicting across species
To determine whether pygmy goats could be used as a surrogate species to predict Alpine ibex behaviour, the model using the pygmy goat dataset was used to predict Alpine ibex behaviour from the Alpine ibex dataset. Behaviours that were not observed across both species (specifically, climbing and browsing) were excluded. Models with data at the lowest acceptable sampling frequency were used to predict behaviour and, for locomotory behaviours, behaviour subdivided by slope of terrain (flat, uphill or downhill; see Table 1). Model performance was compared with the full initial model to when data observations classes were balanced and the number of predicted behaviours was reduced. A sex-specific model was tested that excluded the male ibex from the cross-species model. To check model performance compared to a random model, observed behaviours were randomly generated onto the acceleration data using the same proportion of actual observations for each behaviour and used to build a random forest model.
Refining random forest models
Random forest models were built for the different sampling frequencies using either accelerometry variables only or both accelerometry and magnetometry variables. Seven variables were removed due to them being highly correlated and a further 13 variables were removed in RFE, with 17 variables included in the final model (Fig. 2; Additional file 2 Fig. S4). Model accuracy was not significantly different between the 40 Hz and the 20 Hz model (t4,5 = − 0.003, p = 0.71) or the 10 Hz model (t4,5 = − 0.013, p = 0.21). However, it was significantly lower at 5 Hz (t4,5 = − 0.030, p = 0.025), and 1 Hz (t4,5 = − 0.095, p < 0.001) (Fig. 3). Thus, a sampling frequency of 10 Hz was selected as the best model as a compromise between model performance and ability to process. Overall, model accuracy was significantly different for Alpine ibex and pygmy goats (t6,13 = − 0.13, p = 0.001).
Comparing models with a sampling frequency of 10 Hz and higher, model accuracy was higher when magnetometry variables were included (t2,9 = 0.008, p = 0.03). Model accuracy of the final selected models using randomly split data was 98.6% for Alpine ibex with a mean ± SD F1 statistic of 0.96 ± 0.011 and 97.8% for pygmy goats with a mean ± SD F1 statistic of 0.96 ± 0.016 (Table 2). Although model accuracy was lower using balanced data classes (F1,2 = 0.079, p = 0.80), the precision for separate behaviours was significantly higher (F1,2 = 72.9, p = 0.013). Prediction of behaviours using fewer behaviours enhanced model accuracy (F1,2 = 0.17, p = 0.72) and the mean F1 statistic (F1,2 = 12.45, p = 0.07). Using data split per individual, the mean model accuracy was 56.7 ± 0.06% for Alpine ibex with a mean ± SD F1 statistic of 0.37 ± 0.02 and 57.9 ± 0.05% for pygmy goats with a mean ± SD F1 statistic of 0.34 ± 0.03 (Table 2; Fig. 4). Model accuracy was significantly lower in balanced data classes (F1,28 = 46.6, p < 0.001) and was improved when the number of behaviour classes was reduced (F1,28 = 0.70, p = 0.41). Using F1 statistic as a measure of model performance, model performance was higher when using balanced observations (F1,28 = 3.71, p = 0.06) and when the number of behaviours was reduced (F1,28 = 25.3, p < 0.001).
Behavioural templates for Alpine ibex and pygmy goats
Random forest models, at a sampling frequency of 10 Hz, were built to predict the slope of the terrain for locomotion behaviours; flat, uphill or downhill. Overall model accuracy when slope was included was 98.6% for Alpine ibex with a mean ± SD F1 statistic of 0.96 ± 0.016 and 98.0% for pygmy goats with a mean ± SD F1 statistic of 0.96 ± 0.016 (Fig. 4; Table 2; Additional file 3 Fig. S6). Pitch was the most important variable for pygmy goats, and smoothed VeDBA was the most important variable for Alpine ibex predicting behaviours. Static X axis acceleration was the most important variable when the model predicted Alpine ibex behaviour including terrain slope.
Three variables were in the top 5 most important variables, ranked by mean Gini decrease, for both the Alpine ibex and pygmy goats. These were posture, given by the surge axis (static X), angle of surge posture (pitch) and smoothed VeDBA (smVeDBA) (Fig. 5; Additional file 3 Fig. S6 Table S5).
Applying pygmy goat behavioural template to Alpine ibex
In the investigation examining the extent to which the model conditioned on the pygmy goat training dataset could be used to predict behaviours observed in the Alpine ibex training dataset, model accuracy was 54.3% for predicting behaviours. The model reached a mean ± SD precision of 0.54 ± 0.38, recall of 0.61 ± 0.11 and F1 statistic of 0.47 ± 0.29 (Table 3). The largest errors in the model were produced from misclassifying resting as standing, and trotting as either walking or running (Additional file 3 Table S6). Standing, walking, eating and running had the highest recall and precision in this model (Fig. 6). A model using randomly generated ‘observed’ behaviours had a classification accuracy of 15.4% (Table 3).
Model accuracy for predicting behaviours and slope of terrain for locomotion behaviour was 60.5%. The model reached a mean ± SD precision of 0.28 ± 0.41, recall of 0.26 ± 0.30 and F1 statistic of 0.24 ± 0.34 (Table 3). Locomotion behaviours on a slope had very low precision and recall (Fig. 6; Additional file 3 Table S7). A model using randomly generated ‘observed’ behaviours including slope for locomotion behaviours had a classification accuracy of 26.4% (Table 3). For both models, model accuracy improved when using a sex-specific model (predicting only female Alpine ibex behaviour), however other model performance metrics did not change.
Accurately identifying animal behaviour is key to the validity of using accelerometers to address important ecological questions in free-ranging animals. However, there remains limited information on best practice, especially when captive or domestic individuals are used to inform workers on the putative behaviour of wild species. In this study, behavioural classification was achieved with high accuracy for both captive Alpine ibex and domestic pygmy goats, using observations of each species respectively and taking steps to refine the application of random forest models. All behaviours and the slope of terrain for locomotion behaviours could be predicted with high accuracy. However, limitations were identified when the models were used to predict the behaviour of individuals not used in model training, whether they were the same species or not. Domestic or captive surrogates may be useful to predict the broad behaviours of a captive wild species but locomotion on terrain with different slope characteristics remains problematic. Thus, while captive surrogates may be useful for classifying behaviour in some free-ranging animals, the selection of appropriate counterparts or surrogates must be carefully considered for accurately classifying behaviours.
Despite decreased model performance when Alpine ibex behaviour was predicted from domestic pygmy goats, the biggest decrease in model performance occurred when individually split data was used instead of randomly split data. This suggests that the limitations of predicting the behaviours of individuals that cannot be observed lies within intraspecific individual differences rather than inter-specific variation . Behaviours such as resting were not well identified, which is typically considered to be an easy behaviour to identify, and a definitive explanation for this remains elusive. Despite this, broad behaviours were identifiable although some behaviours remained problematic in the cross-species model, particularly as regards the effect of terrain slope for locomotion- and resting behaviours.
Domestic surrogates, or even captive surrogates of a different species, have been suggested to have value for informing behavioural classification and the concept is certainly logical [22, 25]. Against this though, we observed low classification accuracy, and were unable to identify the full suite of behaviours observed in the captive counterparts, using our domestic surrogate. Critically, the value of using captive or domestic individuals as surrogates to predict the behaviour of free-living individuals requires that the surrogates and wild animals to move and behave in a similar way. However, the extent to which this is true depends critically on the size and morphology differences between the species dyads. For example, domestication may change bone structure , thus leading to changes in gait and movement and body size, which can have a marked effect on stride length and stride frequency , and with it the acceleration values recorded by animal-attached devices. Pygmy goats are known for their characteristically short legs (height = 31 and 45 cm ;) associated with their adaptation to humid environments , whereas the longer legs of Alpine ibex facilitate locomotion through their mountainous habitat (female height = 73 to 84 cm, male height = 90 to 101 cm ). The high degree of sexual dimorphism in Alpine ibex , means that males are more different than females to female pygmy goats. This disparity may explain the reduced accuracy of models using pygmy goat observations to predict Alpine ibex behaviour. Indeed, model performance was higher when pygmy goat observations were used to predict the behaviour of female ibex, indicating that it is the increased difference between male Alpine ibex and female pygmy goats that reduces the ability of the model to predict behaviour between them. This suggests that there is value in using sex specific models when classifying behaviours sexually dimorphic species.
The environment in which the surrogate individuals live must replicate, as far as possible, that of their wild counterparts for them to exhibit the same behavioural profiles. Our captive Alpine ibex were observed to display a wider range of behaviours and terrain slopes because they were kept in a large and varied enclosure with rocks and small cliffs. So, simplistically, climbing in ibex could not be predicted using our pygmy goat surrogate because, although the goats had slopes within their enclosure, none were comparable to the rocks that ibex used. This limitation may be especially important for measuring behaviour of individuals that may access food or water in a manner different to that observed in captivity, a clear case being predators that cannot hunt in captivity [24, 28]. In fact, animal home ranges can cover large areas which display habitat and topographical heterogeneity, which will presumably produce corresponding heterogeneity in accelerometer signals, particularly during movement, so it is important to be able to interpret and account for the gradient, substrate and superstrate of the terrain during locomotion . Using surrogates that are in a varied enclosure that mimics the species natural environment would reduce the issues linked to environment that arise from using captive or domestic surrogates.
Orientation on slopes is expected to alter the static surge acceleration signal as the collar-attached device abuts the animal’s neck, particularly if the animal is facing, or moving, up an appreciable slope. Indeed, the extent to which the device on the collar can swing should prove an important issue in defining behaviours; the more it can swing, the more it will act like a gimble and be less likely to be constrained to a particular angle by abutting the neck. Against this, loose collars may introduce unwanted variability during movement . Terrain will also affect the acceleration profiles measured for different behaviours because animals often respond to terrain by changing gait, stride length and speed , so enclosures used for captive calibration of behaviours from logging devices should display the entire range of topographies available to the free-ranging animals of interest.
A perennial issue for biologgers is the trade-off between high resolution data (both in terms of time and bits) and required battery power [19, 54]. Lower frequencies can extend deployment time and reduce battery power, memory on internal storage devices and required processing power. In this study, we found that highest classification accuracy was achieved using a sampling rate of 10 Hz or above and, even when sampling rate was reduced to 1 Hz, it still resulted in 87.4% correctly classified behaviours, which is deemed acceptable by other studies [18, 24, 55].
The ease with which biologger data can be analysed to highlight behaviour using random forests  belies a few important considerations. Firstly, there is a tendency to include a large number of variables from tri-axial accelerometers for random forest models even though many have not been tested for the benefit of their inclusion. Although random forest models can handle noisy variables and can be robust to overfitting , 20 variables were not included in the dataset, either due to being correlated or deemed redundant using recursive feature selection. This suggests that there is value in selecting variables that are biologically and mechanistically important in describing the behaviours and therefore important to the model. This, in turn, necessitates proper understanding of what the various acceleration metrics mean and how they are changed by both the different behaviours and the environment (topography etc.). Other steps that have been suggested to improve random forest model performance were also taken. Although using balanced observation classes did not significantly improve model performance, steps to reduce the number of behaviours predicted (removing less relevant behaviours) did improve model performance. The behaviours included when classifying behaviours should be carefully selected, as including behaviours that are not relevant for the study may reduce the accuracy of relevant behaviours. Furthermore, when applying behaviour templates to unobserved data, steps to reduce the chance of predicting the wrong behaviour should be taken such as setting a threshold accuracy (see Ferdinandy et al., ).
Finally, many biologgers have accelerometers within inertial measurement units (IMUs), which also have tri-axial magnetometers built in although few studies have included tri-axial magnetometry in behavioural classification despite the potential for it to be useful [20, 21]. Our work showed that by including (limited) variables derived from tri-axial magnetometry, classification accuracy was significantly improved. This may prove particularly valuable in the future, since magnetometers may be able to elucidate patterns of movement in a manner different to accelerometers, thus potentially providing important additional information for behavioural classification .
A template for applying methods to identify the behaviours of wild or captive Caprids using captive and domestic counterparts using tri-axial accelerometry and magnetometry is provided, highlighting the need the create standardised methodologies, including data processing steps, especially when selecting variables and using random forest models. High model performance could be achieved for two caprid species using video observations with a relatively low sampling frequency (10 Hz), including predicting the slope of terrain for locomotion behaviours. Tri-axial magnetometry is a useful tool to aid behavioural classification and slope of terrain for locomotion behaviours could be accurately predicted. We demonstrate the importance of using sex-split training datasets in sexually dimorphic species. While we show that model performance is reduced when predicting the behaviours of individuals not included in the training data, it is comparable when predicting for the same or a similar species. The use of an individual-split cross-validation approach better demonstrates the application of these methods to individuals of the same or similar species. For prediction of the behaviours of a different species, all efforts should be made to maximise the similarities between surrogate and study species, including their respective environments.
Availability of data and materials
The datasets for this study and the code used for analysis will be made available online.
Wilson RP, Shepard ELC, Liebsch N. Prying into the intimate details of animal lives: use of a daily diary on animals. Endanger Species Res. 2008;4:123–37. https://doi.org/10.3354/esr00064.
Chmura HE, Glass TW, Williams CT. Biologging physiological and ecological responses to climatic variation: new tools for the climate change era. Front Ecol Evol. 2018;6:1–9.
Wilson ADM, Wikelski M, Wilson RP, Cooke SJ. Utility of biological sensor tags in animal conservation. Conserv Biol. 2015;29(4):1065–75. https://doi.org/10.1111/cobi.12486.
Nathan R, Spiegel O, Fortmann-Roe S, Harel R, Wikelski M, Getz WM. Using tri-axial acceleration data to identify behavioral modes of free-ranging animals: general concepts and tools illustrated for griffon vultures. J Exp Biol. 2012;215(6):986–96. https://doi.org/10.1242/jeb.058602.
Fehlmann G, O’Riain MJ, Hopkins PW, O’Sullivan J, Holton MD, Shepard ELC, et al. Identification of behaviours from accelerometer data in a wild social primate. Anim Biotelemetry. 2017;5:1–11.
Brown DD, Kays R, Wikelski M, Wilson RP, Klimley A. Observing the unwatchable through acceleration logging of animal behavior. Anim Biotelemetry. 2013;1(1):20. https://doi.org/10.1186/2050-3385-1-20.
Gómez Laich A, Wilson RP, Quintana F, Shepard ELC. Identification of imperial cormorant Phalacrocorax atriceps behaviour using accelerometers. Endanger Species Res. 2010;10:29–37.
Bidder OR, di Virgilio A, Hunter JS, McInturff A, Gaynor KM, Smith AM, et al. Monitoring canid scent marking in space and time using a biologging and machine learning approach. Sci Rep. 2020;10:1–13.
Altmann J. Observational study of behavior: sampling. Behaviour. 1974;49(3-4):227–67. https://doi.org/10.1163/156853974X00534.
Brown DD, Lapoint S, Kays R, Heidrich W, Kümeth F, Wikelski M. Accelerometer-informed GPS telemetry: reducing the trade-off between resolution and longevity. Wildl Soc Bull. 2012;36(1):139–46. https://doi.org/10.1002/wsb.111.
Aguado MÁP, Sturaro E, Ramanzin M. Individual activity interacts with climate and habitat features in influencing GPS telemetry performance in an alpine herbivore. Hystrix. 2017;28(1):36–42.
Bourgoin G, Garel M, Dubray D, Maillard D, Gaillard JM. What determines global positioning system fix success when monitoring free-ranging mouflon? Eur J Wildl Res. 2009;55(6):603–13. https://doi.org/10.1007/s10344-009-0284-1.
Walker JS, Jones MW, Laramee RS, Holton MD, Shepard ELC, Williams HJ, et al. Prying into the intimate secrets of animal lives; software beyond hardware for comprehensive annotation in ‘daily diary’ tags. Mov Ecol. 2015;3(1):29. https://doi.org/10.1186/s40462-015-0056-3.
Wilson RP, Holton MD, Vigilio A, Williams HJ, Shepard ELC, Quintana F, et al. Give a machine a hand: a Boolean time-based decision-tree template for finding animal behaviours rapidly in multi-sensor data. Methods Ecol Evol. 2018;9(11):2206–15. https://doi.org/10.1111/2041-210X.13069.
Bidder OR, Campbell HA, Gomez-Laich A, Urge P, Walker J, Cai Y, et al. Love thy neighbour: automatic animal behavioural classification of acceleration data using the K-nearest neighbour algorithm. PLoS One. 2014;9(2):e88609. https://doi.org/10.1371/journal.pone.0088609.
Ladds MA, Thompson AP, Kadar JP, Slip DJ, Hocking DP, Harcourt RG. Super machine learning: improving accuracy and reducing variance of behaviour classification from accelerometry. Anim Biotelemetry. 2017:5(8).
Rast W, Kimmig SE, Giese L, Berger A. Machine learning goes wild: using data from captive individuals to infer wildlife behaviours. PLoS One. 2020;15:1–25.
Tatler J, Cassey P, Prowse TAA. High accuracy at low frequency: detailed behavioural classification from accelerometer data. J Exp Biol. 2018;221:jeb184085.
Hounslow JL, Brewster LR, Lear KO, Guttridge TL, Daly R, Whitney NM, et al. Assessing the effects of sampling frequency on behavioural classification of accelerometer data. J Exp Mar Bio Ecol. 2019;512:22–30. https://doi.org/10.1016/j.jembe.2018.12.003.
Williams HJ, Holton MD, Shepard ELC, Largey N, Norman B, Ryan PG, et al. Identification of animal movement patterns using tri-axial magnetometry. Mov Ecol. 2017;5(1):6. https://doi.org/10.1186/s40462-017-0097-x.
Chakravarty P, Maalberg M, Cozzi G, Ozgul A, Aminian K. Behavioural compass: animal behaviour recognition using magnetometers. Mov Ecol. 2019;7:1–13.
Pagano AM, Rode KD, Cutting A, Owen MA, Jensen S, Ware JV, et al. Using tri-axial accelerometers to identify wild polar bear behaviors. Endanger Species Res. 2017;32:19–33. https://doi.org/10.3354/esr00779.
Mosser AA, Avgar T, Brown GS, Walker CS, Fryxell JM. Towards an energetic landscape: broad-scale accelerometry in woodland caribou. J Anim Ecol. 2014;83(4):916–22. https://doi.org/10.1111/1365-2656.12187.
Wang Y, Nickel B, Rutishauser M, Bryce CM, Williams TM, Elkaim G, et al. Movement, resting, and attack behaviors of wild pumas are revealed by tri-axial accelerometer measurements. Mov Ecol. 2015;3(1):2. https://doi.org/10.1186/s40462-015-0030-0.
Campbell HA, Gao L, Bidder OR, Hunter J, Franklin CE. Creating a behavioural classification module for acceleration data: using a captive surrogate for difficult to observe species. J Exp Biol. 2013;216(24):4501–6.
Shuert CR, Pomeroy PP, Twiss SD. Assessing the utility and limitations of accelerometers and machine learning approaches in classifying behaviour during lactation in a phocid seal. Anim Biotelemetry. 2018;6(14).
Bidder OR, Qasem LA, Wilson RP. On higher ground: how well can dynamic body acceleration determine speed in variable terrain? PLoS One. 2012;7(11):e50556. https://doi.org/10.1371/journal.pone.0050556.
Ferdinandy B, Gerencsér L, Corrieri L, Perez P, Újváry D, Csizmadia G, et al. Challenges of machine learning model validation using correlated behaviour data: evaluation of cross-validation strategies and accuracy measures. PLoS One. 2020;15(7):e0236092. https://doi.org/10.1371/journal.pone.0236092.
Gurarie E, Bracis C, Delgado M, Meckley TD, Kojola I, Wagner CM. What is the animal doing? Tools for exploring behavioural structure in animal movements. J Anim Ecol. 2016;85(1):69–84. https://doi.org/10.1111/1365-2656.12379.
Shepard ELC, Wilson RP, Rees WG, Grundy E, Lambertucci SA, Vosper SB. Energy landscapes shape animal movement ecology. Am Nat. 2013;182(3):298–312. https://doi.org/10.1086/671257.
Halsey LG, Shepard ELC, Quintana F, Gomez Laich A, Green JA, Wilson RP. The relationship between oxygen consumption and body acceleration in a range of species. Comp Biochem Physiol - A Mol Integr Physiol. 2009;152(2):197–202. https://doi.org/10.1016/j.cbpa.2008.09.021.
Claussen DL, Snashall J, Barden C. Effects of slope, substrate, and temperature on forces associated with locomotion of the ornate box turtle, Terrapene ornata. Comp Biochem Physiol - A Mol Integr Physiol. 2004;138(3):269–76. https://doi.org/10.1016/j.cbpb.2003.08.010.
Sun J, Walters M, Svensson N, Lloyd D. The influence of surface slope on human gait characteristics: a study of urban pedestrians walking on an inclined surface. Ergonomics. 1996;39(4):677–92. https://doi.org/10.1080/00140139608964489.
Parrini F, Cain JW, Krausman PR. Capra ibex (Artiodactyla: Bovidae). Mamm Species. 2009;830:1–12. https://doi.org/10.1644/830.1.
Aulagnier S, Kranz A, Lovari S, Jdeidi T, Masseti M, Nader I, et al. Capra ibex (Alpine Ibex, Ibex). IUCN 2013 IUCN Red List Threat Species Version 20131. 2008;8235. http://www.iucnredlist.org/details/42397/0
Mason THE, Brivio F, Stephens PA, Apollonio M, Grignolio S. The behavioral trade-off between thermoregulation and foraging in a heat-sensitive species. Behav Ecol. 2017;28(3):908–18. https://doi.org/10.1093/beheco/arx057.
Brivio F, Zurmühl M, Grignolio S, von Hardenberg J, Apollonio M, Ciuti S. Forecasting the response to global warming in a heat-sensitive species. Sci Rep. 2019;9(1):3048. https://doi.org/10.1038/s41598-019-39450-5.
Larsen G. A reliable ruminate for research. Lab Animal. 2015;44(9):337. https://doi.org/10.1038/laban.846.
Dickinson ER, Stephens PA, Marks NJ, Wilson RP, Scantlebury DM. Best practice for collar deployment of tri-axial accelerometers on a terrestrial quadruped to provide accurate measurement of body acceleration. Anim Biotelemetry. 2020;8(9).
Shepard ELC, Wilson RP, Quintana F, Gómez Laich A, Liebsch N, Albareda D, et al. Identification of animal movement patterns using tri-axial accelerometry. Endanger Species Res. 2008;10:47–60. https://doi.org/10.3354/esr00084.
Gleiss AC, Wilson RP, Shepard ELC. Making overall dynamic body acceleration work: on the theory of acceleration as a proxy for energy expenditure. Methods Ecol Evol. 2011;2(1):23–33. https://doi.org/10.1111/j.2041-210X.2010.00057.x.
Cutler DR, Edwards TC, Beard KH, Cutler A, Hess KT, Gibson J, et al. Random forests for classification in ecology. Ecology. 2007;88(11):2783–92. https://doi.org/10.1890/07-0539.1.
R Core Team. R: a language and environment for statistical computing. Vienna, Austria: R Foundation for statistical Computing; 2020. https://www.r-project.org/
Liaw A, Wiener M. Classification and regression by randomForest. R News. 2002;2(3):18–22.
Geneur R, Poggi J, Tuleau-Marlot C. Variable selection using random forests. Pattern Recogn Lett. 2010;31(14):2225–36. https://doi.org/10.1016/j.patrec.2010.03.014.
Toloşi L, Lengauer T. Classification with correlated features: unreliability of feature ranking and solutions. Bioinformatics. 2011;27(14):1986–94. https://doi.org/10.1093/bioinformatics/btr300.
Kuhn M. Building predictive models in R using the caret package. J Stat Softw. 2008;28(5):1–26.
Fox EW, Hill RA, Leibowitz SG, Olsen AR, Darren J, Weber MH, et al. Assessing the accuracy and stability of variable selection methods for random forest modelling in ecology. Environ Monit Asses. 2017;189(7):316. https://doi.org/10.1007/s10661-017-6025-0.
Shackelford L, Marshall F, Peters J. Identifying donkey domestication through changes in cross-sectional geometry of long bones. J Archaeol Sci. 2013;40(12):4170–9. https://doi.org/10.1016/j.jas.2013.06.006.
Heglund NC, Taylor CR, McMahon TA. Scaling stride frequency and gait to animal size: mice to horses. Science. 1974;186(4169):1112–3. https://doi.org/10.1126/science.186.4169.1112.
Fajemilehin OS, Salako AE. Body measurement characteristics of the west African dwarf (WAD) goat in deciduous forest zone of southwestern Nigeria. Afr J Biotechnol. 2008;7(14):2521–6.
Daramola JO, Adeloye AA. Physiological adaptation to the humid tropics with special reference to the west African dwarf (WAD) goat. Trop Anim Health Prod. 2009;41(7):1005–16. https://doi.org/10.1007/s11250-008-9267-6.
Birn-Jeffery AV, Higham TE. The scaling of uphill and downhill locomotion in legged animals. Integr Comp Biol. 2014;54(6):1159–72. https://doi.org/10.1093/icb/icu015.
Walton E, Casey C, Mitsch J, Vázquez-Diosdado JA, Yan J, Dottorini T, et al. Evaluation of sampling frequency, window size and sensor position for classification of sheep behaviour. R Soc Open Sci. 2018;5(2). https://doi.org/10.1098/rsos.171442.
Studd EK, Landry-Cuerrier M, Menzies AK, Boutin S, McAdam AG, Lane JE, et al. Behavioral classification of low-frequency acceleration and temperature data from a free-ranging small mammal. Ecol Evol. 2019;9(1):619–30. https://doi.org/10.1002/ece3.4786.
We thank Kolmården Wildlife Park for allowing us to conduct this study, and the staff for their help with the training and sedating Alpine ibex; Linda Berggren, Sofie Björklund, Torsten Möller, Louise Guevara, Bim Boijsen, Michael Hepher and the other animal keepers. We would also like to thank Pieter Giljam (Zoospenseful and Kolmården) for his expertise in training the ibex. We thank Belfast Zoo and Belfast City Council for allowing us to conduct this study. We thank Alyn Cairns, Raymond Robinson, Pete, Chris, Demi, Aisling, Paul and other staff for their help during data collection on pygmy goats.
ED was supported by a Department for Education studentship, Northern Ireland, and a Queen’s University Belfast William and Betty MacQuitty travel scholarship for the work at Kolmården.
Ethics approval and consent to participate
This study was approved by the Queens University Belfast ethics committee (QUB-BS-AREC-19-004) and internal ethical approval from Belfast Zoo and Kolmården Wildlife Park.
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Details of the individuals and training the male Alpine ibex to have collars put on and taken off. Table S2. Description of training protocol. Figure S1. A male ibex being rewarded standing in the protective feeding station (step 3). Figure S3. Three target male ibex rewarded in their designated protective stations (step 4, only two were successfully trained beyond this step). Figure S4. Holding the collar around a male ibex neck while he stands in the protective station, one trainer holds the collar while the second provides the reward (step 7). Table S3. Total time observed of each behaviour for each individual pygmy goat (G) or Alpine ibex (IB) in seconds.
Methods for building and refining random forest models to predict the behaviour of Alpine ibex and pygmy goats. Table S4. A list of the accelerometry and magnetometry variables that are used or calculated for the random forest model. Including the name, and label, the description of the variable and its calculation. Figure S4. Recursive feature elimination plots showing the cross-validated model accuracy when a different number of acceleration and magnetometry variables are included in the random forest models for classifying the behaviours of (a) Alpine ibex and (b) pygmy goat. Figure S5. Random forest error plots across 500 trees for classifying each of the nine behavioural states (Aggression, Browsing (pygmy goats only), Climbing (Alpine ibex only), Grazing, Grooming, Lying down, Running, Shaking, Standing, Trotting and Walking) and Out-of-bag (OOB) error estimates for each different model at 10 Hz for both species (a,b) including the models with: (c,d) balanced observations and (e,f) reduced behaviour classes. Figure S6. Random forest error plots across 500 trees for classifying each of the nine behavioural states including terrain slope for locomotion behaviours (Aggression, Browsing (pygmy goats only), Climbing (Alpine ibex only), Grazing, Grooming, Lying down, Running, Shaking, Standing, Trotting and Walking) and Out-of-bag (OOB) error estimates, for (A) Alpine ibex and (B) pygmy goats. Table S5. The variable reduction process to reach the final selected model.
Random forest model results. Figure S6. The importance of each variable retained in the models predicting behaviour and behaviours including terrain slope. Table S5. The median and 1st and 3rd quantile of acceleration, for each behaviour and species, for three variables. Table S6. Confusion matrix showing the observed behaviours and predicted behaviours (in seconds) when training the random forest model built using the pygmy goat training dataset. Table S7. Confusion matrix showing the observed behaviours and predicted behaviours (in seconds) when using a random forest model built using pygmy goat training dataset and tested on the Alpine ibex training data set. Table S8. Confusion matrix showing the observed behaviours and predicted behaviours, including the gradient of terrain for locomotion behaviours, when training the random forest model built using the pygmy goat training dataset. Table S9. Confusion matrix showing the observed behaviours and predicted behaviours, including the gradient of terrain for locomotion behaviours, when using a random forest model built using pygmy goat training dataset and tested on the Alpine ibex training data set.
The importance of each variable ordered by mean Gini decrease for the model predicting behaviours including slope of terrain; (a Pygmy goats with ‘Pitch’ as the most important variable and (b) Alpine ibex with ‘Static X’ as the most important variable. Table S5. The median and 1st and 3rd quantile of acceleration, for each behaviour and species, for the three variables that are in the top 5 most important variables for predicting behaviour of both pygmy goats and Alpine ibex. Table S6. Confusion matrix showing the observed behaviours and predicted behaviours when using a random forest model built using pygmy goat training dataset and tested on the Alpine ibex training data set. Italicised cells are the true positives where the behaviour has been correctly predicted. Table S7. Confusion matrix showing the observed behaviours and predicted behaviours, including the gradient of terrain for locomotion behaviours, when using a random forest model built using pygmy goat training dataset and tested on the Alpine ibex training data set. Italicised cells are the true positives where the behaviour has been correctly predicted. (Downhill = D, Flat = F, Uphill = U).
About this article
Cite this article
Dickinson, E.R., Twining, J.P., Wilson, R. et al. Limitations of using surrogates for behaviour classification of accelerometer data: refining methods using random forest models in Caprids. Mov Ecol 9, 28 (2021). https://doi.org/10.1186/s40462-021-00265-7