CVJan 14Code
Self-Supervised Animal Identification for Long VideosXuyang Fang, Sion Hannuna, Edwin Simpson et al.
Identifying individual animals in long-duration videos is essential for behavioral ecology, wildlife monitoring, and livestock management. Traditional methods require extensive manual annotation, while existing self-supervised approaches are computationally demanding and ill-suited for long sequences due to memory constraints and temporal error propagation. We introduce a highly efficient, self-supervised method that reframes animal identification as a global clustering task rather than a sequential tracking problem. Our approach assumes a known, fixed number of individuals within a single video -- a common scenario in practice -- and requires only bounding box detections and the total count. By sampling pairs of frames, using a frozen pre-trained backbone, and employing a self-bootstrapping mechanism with the Hungarian algorithm for in-batch pseudo-label assignment, our method learns discriminative features without identity labels. We adapt a Binary Cross Entropy loss from vision-language models, enabling state-of-the-art accuracy ($>$97\%) while consuming less than 1 GB of GPU memory per batch -- an order of magnitude less than standard contrastive methods. Evaluated on challenging real-world datasets (3D-POP pigeons and 8-calves feeding videos), our framework matches or surpasses supervised baselines trained on over 1,000 labeled frames, effectively removing the manual annotation bottleneck. This work enables practical, high-accuracy animal identification on consumer-grade hardware, with broad applicability in resource-constrained research settings. All code written for this paper are \href{https://huggingface.co/datasets/tonyFang04/8-calves}{here}.
CVMar 17, 2025Code
8-Calves Image datasetXuyang Fang, Sion Hannuna, Neill Campbell et al.
Automated livestock monitoring is crucial for precision farming, but robust computer vision models are hindered by a lack of datasets reflecting real-world group challenges. We introduce the 8-Calves dataset, a challenging benchmark for multi-animal detection, tracking, and identification. It features a one-hour video of eight Holstein Friesian calves in a barn, with frequent occlusions, motion blur, and diverse poses. A semi-automated pipeline using a fine-tuned YOLOv8 detector and ByteTrack, followed by manual correction, provides over 537,000 bounding boxes with temporal identity labels. We benchmark 28 object detectors, showing near-perfect performance on a lenient IoU threshold (mAP50: 95.2-98.9%) but significant divergence on stricter metrics (mAP50:95: 56.5-66.4%), highlighting fine-grained localization challenges. Our identification benchmark across 23 models reveals a trade-off: scaling model size improves classification accuracy but compromises retrieval. Smaller architectures like ConvNextV2 Nano achieve the best balance (73.35% accuracy, 50.82% Top-1 KNN). Pre-training focused on semantic learning (e.g., BEiT) yielded superior transferability. For tracking, leading methods achieve high detection accuracy (MOTA > 0.92) but struggle with identity preservation (IDF1 $\approx$ 0.27), underscoring a key challenge in occlusion-heavy scenarios. The 8-Calves dataset bridges a gap by providing temporal richness and realistic challenges, serving as a resource for advancing agricultural vision models. The dataset and code are available at https://huggingface.co/datasets/tonyFang04/8-calves.
NAMar 16
Compression of Currents and VarifoldsAllen Paul, Neill Campbell, Tony Shardlow
We derive an algorithm for compression of the currents and varifolds representations of shapes, using ridge leverage score (RLS) sampling, and the theory of Nystrom approximation in Reproducing Kernel Hilbert Spaces. Our method is faster than existing compression techniques and comes with theoretical guarantees on the rate of decay of the compression error as a function of the smoothness of the associated shape representation. The obtained compressions are shown to be useful for accelerating downstream tasks such as nonlinear shape registration in the Large Deformation Diffeomorphic Metric Mapping (LDDMM) framework without loss of quality, even for very high compression ratios. The performance of our algorithm is demonstrated on large-scale shape data from modern geometry processing datasets, and is shown to be fast and scalable with rapid error decay.
NAMar 17
Sparse Randomized Approximation of Normal CyclesAllen Paul, Neill Campbell, Tony Shardlow
We extend our work for compression of currents and varifolds to a compression algorithm for the embedded normal cycles representation of shape, restricted to the constant normal kernel case, using the Nystrom approximation in Reproducing Kernel Hilbert Spaces (RKHS) and ridge leverage score (RLS) sampling. Our method comes with theoretical guarantees on the compression error decay, and the approximations are shown to be effective for downstream tasks such as nonlinear shape registration in the Large Deformation Diffeomorphic Metric Mapping (LDDMM) framework, even for very high compression ratios. The performance of our algorithm is demonstrated on large-scale shape data from modern geometry processing datasets and is shown to accelerate downstream registration tasks significantly.
CVMar 29, 2024
Universal Bovine Identification via Depth Data and Deep Metric LearningAsheesh Sharma, Lucy Randewich, William Andrew et al.
This paper proposes and evaluates, for the first time, a top-down (dorsal view), depth-only deep learning system for accurately identifying individual cattle and provides associated code, datasets, and training weights for immediate reproducibility. An increase in herd size skews the cow-to-human ratio at the farm and makes the manual monitoring of individuals more challenging. Therefore, real-time cattle identification is essential for the farms and a crucial step towards precision livestock farming. Underpinned by our previous work, this paper introduces a deep-metric learning method for cattle identification using depth data from an off-the-shelf 3D camera. The method relies on CNN and MLP backbones that learn well-generalised embedding spaces from the body shape to differentiate individuals -- requiring neither species-specific coat patterns nor close-up muzzle prints for operation. The network embeddings are clustered using a simple algorithm such as $k$-NN for highly accurate identification, thus eliminating the need to retrain the network for enrolling new individuals. We evaluate two backbone architectures, ResNet, as previously used to identify Holstein Friesians using RGB images, and PointNet, which is specialised to operate on 3D point clouds. We also present CowDepth2023, a new dataset containing 21,490 synchronised colour-depth image pairs of 99 cows, to evaluate the backbones. Both ResNet and PointNet architectures, which consume depth maps and point clouds, respectively, led to high accuracy that is on par with the coat pattern-based backbone.
CVJun 16, 2020
Visual Identification of Individual Holstein-Friesian Cattle via Deep Metric LearningWilliam Andrew, Jing Gao, Siobhan Mullan et al.
Holstein-Friesian cattle exhibit individually-characteristic black and white coat patterns visually akin to those arising from Turing's reaction-diffusion systems. This work takes advantage of these natural markings in order to automate visual detection and biometric identification of individual Holstein-Friesians via convolutional neural networks and deep metric learning techniques. Existing approaches rely on markings, tags or wearables with a variety of maintenance requirements, whereas we present a totally hands-off method for the automated detection, localisation, and identification of individual animals from overhead imaging in an open herd setting, i.e. where new additions to the herd are identified without re-training. We propose the use of SoftMax-based reciprocal triplet loss to address the identification problem and evaluate the techniques in detail against fixed herd paradigms. We find that deep metric learning systems show strong performance even when many cattle unseen during system training are to be identified and re-identified -- achieving 93.8% accuracy when trained on just half of the population. This work paves the way for facilitating the non-intrusive monitoring of cattle applicable to precision farming and surveillance for automated productivity, health and welfare monitoring, and to veterinary research such as behavioural analysis, disease outbreak tracing, and more. Key parts of the source code, network weights and datasets are available publicly.