High-resolution, non-invasive animal tracking and reconstruction of local environment in aquatic ecosystems

Table 2 Dataset parameters and accuracy metrics

Dataset	Annotations	Rate (Hz)	Resolution (px)	Coverage (%)	Accuracy metrics
					Metric	Reconstruction (cm)	Reprojection (px)	Tracking (cm)
single	171	30	2.7k	97.79	median	0.30	9.65	NA
					RMSE	1.28	16.30	NA
w/ sv	as above			100.00	as above
mixed	80	30	4k	69.60	median	0.44	3.77	NA
					RMSE	1.09	7.77	NA
school	160	60	2.7k	78.38	median	0.06	2.57	NA
					RMSE	0.30	3.78	NA
w/ sv	as above			94.02	as above
accuracy	73	30	4k	80.64 ±16.73	median	-0.14 ±0.06	3.53 ±1.96	0.14 ±0.33
					RMSE	1.34 ±0.79	8.56 ±5.21	1.09 ±0.47
w/ sv	as above			97.29 ±2.20	median	as above		0.28 ±0.32
					RMSE			2.12 ±1.37

’w/ sv’ indicates that trajectory points were also estimated from single-view projections at an interpolated depth component. Annotations lists how many frames were annotated for training Mask R-CNN, Rate the frames per second of each video set, i.e. the temporal tracking resolution. Resolution is video resolution, 2.7k: 2704 ×1520 px, 4k: 3840 ×2160 px. Coverage is the mean coverage off all individual trajectories of a dataset. Reconstruction metrics refer to the deviation of reconstructed camera-to-camera distances from the actual distance, Reprojection metrics to the reprojection of triangulated 3D tracks to the original video pixel coordinates and Tracking to the deviation of the tracked calibration wand length from its actual length. In case of the ’accuracy’ dataset, the accuracy results are listed as the mean and standard deviation of the four repeated trials. NA: not applicable

ISSN: 2051-3933