Limitations of using surrogates for behaviour classification of accelerometer data: refining methods using random forest models in Caprids

Background Animal-attached devices can be used on cryptic species to measure their movement and behaviour, enabling unprecedented insights into fundamental aspects of animal ecology and behaviour. However, direct observations of subjects are often still necessary to translate biologging data accurately into meaningful behaviours. As many elusive species cannot easily be observed in the wild, captive or domestic surrogates are typically used to calibrate data from devices. However, the utility of this approach remains equivocal. Methods Here, we assess the validity of using captive conspecifics, and phylogenetically-similar domesticated counterparts (surrogate species) for calibrating behaviour classification. Tri-axial accelerometers and tri-axial magnetometers were used with behavioural observations to build random forest models to predict the behaviours. We applied these methods using captive Alpine ibex (Capra ibex) and a domestic counterpart, pygmy goats (Capra aegagrus hircus), to predict the behaviour including terrain slope for locomotion behaviours of captive Alpine ibex. Results Behavioural classification of captive Alpine ibex and domestic pygmy goats was highly accurate (> 98%). Model performance was reduced when using data split per individual, i.e., classifying behaviour of individuals not used to train models (mean ± sd = 56.1 ± 11%). Behavioural classifications using domestic counterparts, i.e., pygmy goat observations to predict ibex behaviour, however, were not sufficient to predict all behaviours of a phylogenetically similar species accurately (> 55%). Conclusions We demonstrate methods to refine the use of random forest models to classify behaviours of both captive and free-living animal species. We suggest there are two main reasons for reduced accuracy when using a domestic counterpart to predict the behaviour of a wild species in captivity; domestication leading to morphological differences and the terrain of the environment in which the animals were observed. We also identify limitations when behaviour is predicted in individuals that are not used to train models. Our results demonstrate that biologging device calibration needs to be conducted using: (i) with similar conspecifics, and (ii) in an area where they can perform behaviours on terrain that reflects that of species in the wild. Supplementary Information The online version contains supplementary material available at 10.1186/s40462-021-00265-7.

The static acceleration of each axis, representing the orientation of the device and thus the subjects' posture. Calculated as the running mean of acceleration for each orthogonal axis over 2 seconds. (Wilson, Shepard and Liebsch, 2008)

Pitch and Roll
To define the posture of the plane and upright position the 3D orientation towards gravity was converted to angles, of the X axis representing 'sway' movement (Pitch) and Y axis representing 'surge' movement (Roll). Calculated as the arcsine of the X and Y axis respectively. Dynamic acceleration (dynX, dynY, dynZ) The dynamic acceleration of each axis representing the body movement of the animal. Calculated as the static acceleration subtracted from the raw acceleration for each axis.

Overall Dynamic Body Acceleration (ODBA)
A measure of the total body acceleration. The sum of the absolute dynamic acceleration of all three orthogonal axes.

VeDBA smoothed (smVeDBA)
To remove noise and reduce the variation in the VeDBA signal, an additional variable of smoothed VeDBA was used. VeDBA was smoothed using a running mean of 1 sec, was calculated over 1 second using a running mean to remove the variation in VeDBA at 40Hz. Partial dynamic body acceleration (PDBA X, PDBA Y, PDBA Z) The absolute values of acceleration, providing the amplitude of acceleration for each axis. Calculated by returning the absolute positive value of acceleration. (Fehlmann, O'Riain, Hopkins, et al., 2017) VeDBA:PDBA ratio (ratioX, ratioY, ratioZ) The ratio of VeDBA to PDBA for each axis which gives the contribution of each axis acceleration to VeDBA. Calculated by dividing VeDBA by PDBA.

Differential acceleration (difX, difY, difZ)
The rate of change of acceleration over time for each axis. This was calculated for each axis, as the change in acceleration over 5 data points (0.125 sec).

Jerk
The differential of acceleration for the three axes, which is the overall change in acceleration, is referred to as Jerk. Calculated by taking the square root of the sum of all three axes differential multiplied by the sampling rate. First power spectrum density and maximum frequency (PSD1X, PSD1Y, PSD1Z and freq1X, freq1Y, freq1Z) The amplitude and frequency of oscillations was calculated using a Fast Fourier Transformation (FFT) analysis. The first power spectrum density (PSD) and maximum frequency for each axis over a period of two seconds [22][23][24][25][26][27][28][29][30][31][32][33], was calculated using code adapted from Fehlmann et al. 2017. (Fehlmann, O'Riain, Hopkins, et al., 2017 Second power spectrum density and maximum frequency (PSD2X, PSD2Y, PSD2Z and freq2X, freq2Y, freq2Z) As described above.

Magnetic smoothed (magX, magY,magZ)
The magnetic orientation of the device in relation to the magnetic field of the earth for three axes, smoothed over 40 data points (1 sec). (

Magnetic vectoral sum smoothed (MagVecsum)
The smoothed sum of the vectoral magnetometry for three axes, depicting overall absolute change in magnetic orientation.

Magnetic pitch and roll (magpitch, magroll)
A measure of magnetic posture measured as angles using the plane and upright position from the X axis (pitch) and Y axis (roll), calculated using the arcsine of each axis respectively.