Dataset | Annotations | Rate (Hz) | Resolution (px) | Coverage (%) | Accuracy metrics |
---|
| | | | | Metric | Reconstruction (cm) | Reprojection (px) | Tracking (cm) |
---|
single | 171 | 30 | 2.7k | 97.79 | median | 0.30 | 9.65 | NA |
| | | | | RMSE | 1.28 | 16.30 | NA |
w/ sv | as above | 100.00 | as above |
mixed | 80 | 30 | 4k | 69.60 | median | 0.44 | 3.77 | NA |
| | | | | RMSE | 1.09 | 7.77 | NA |
school | 160 | 60 | 2.7k | 78.38 | median | 0.06 | 2.57 | NA |
| | | | | RMSE | 0.30 | 3.78 | NA |
w/ sv | as above | 94.02 | as above |
accuracy | 73 | 30 | 4k | 80.64 ±16.73 | median | -0.14 ±0.06 | 3.53 ±1.96 | 0.14 ±0.33 |
| | | | | RMSE | 1.34 ±0.79 | 8.56 ±5.21 | 1.09 ±0.47 |
w/ sv | as above | 97.29 ±2.20 | median | as above | 0.28 ±0.32 |
| | | | | RMSE | | | 2.12 ±1.37 |
- ’w/ sv’ indicates that trajectory points were also estimated from single-view projections at an interpolated depth component. Annotations lists how many frames were annotated for training Mask R-CNN, Rate the frames per second of each video set, i.e. the temporal tracking resolution. Resolution is video resolution, 2.7k: 2704 ×1520 px, 4k: 3840 ×2160 px. Coverage is the mean coverage off all individual trajectories of a dataset. Reconstruction metrics refer to the deviation of reconstructed camera-to-camera distances from the actual distance, Reprojection metrics to the reprojection of triangulated 3D tracks to the original video pixel coordinates and Tracking to the deviation of the tracked calibration wand length from its actual length. In case of the ’accuracy’ dataset, the accuracy results are listed as the mean and standard deviation of the four repeated trials. NA: not applicable