CVOct 13, 2022
A Systematic Review of Machine Learning Techniques for Cattle Identification: Datasets, Methods and Future DirectionsMd Ekramul Hossain, Muhammad Ashad Kabir, Lihong Zheng et al.
Increased biosecurity and food safety requirements may increase demand for efficient traceability and identification systems of livestock in the supply chain. The advanced technologies of machine learning and computer vision have been applied in precision livestock management, including critical disease detection, vaccination, production management, tracking, and health monitoring. This paper offers a systematic literature review (SLR) of vision-based cattle identification. More specifically, this SLR is to identify and analyse the research related to cattle identification using Machine Learning (ML) and Deep Learning (DL). For the two main applications of cattle detection and cattle identification, all the ML based papers only solve cattle identification problems. However, both detection and identification problems were studied in the DL based papers. Based on our survey report, the most used ML models for cattle identification were support vector machine (SVM), k-nearest neighbour (KNN), and artificial neural network (ANN). Convolutional neural network (CNN), residual network (ResNet), Inception, You Only Look Once (YOLO), and Faster R-CNN were popular DL models in the selected papers. Among these papers, the most distinguishing features were the muzzle prints and coat patterns of cattle. Local binary pattern (LBP), speeded up robust features (SURF), scale-invariant feature transform (SIFT), and Inception or CNN were identified as the most used feature extraction methods.
CVOct 21, 2022
Automatic Cattle Identification using YOLOv5 and Mosaic Augmentation: A Comparative AnalysisRabin Dulal, Lihong Zheng, Muhammad Ashad Kabir et al.
You Only Look Once (YOLO) is a single-stage object detection model popular for real-time object detection, accuracy, and speed. This paper investigates the YOLOv5 model to identify cattle in the yards. The current solution to cattle identification includes radio-frequency identification (RFID) tags. The problem occurs when the RFID tag is lost or damaged. A biometric solution identifies the cattle and helps to assign the lost or damaged tag or replace the RFID-based system. Muzzle patterns in cattle are unique biometric solutions like a fingerprint in humans. This paper aims to present our recent research in utilizing five popular object detection models, looking at the architecture of YOLOv5, investigating the performance of eight backbones with the YOLOv5 model, and the influence of mosaic augmentation in YOLOv5 by experimental results on the available cattle muzzle images. Finally, we concluded with the excellent potential of using YOLOv5 in automatic cattle identification. Our experiments show YOLOv5 with transformer performed best with mean Average Precision (mAP) 0.5 (the average of AP when the IoU is greater than 50%) of 0.995, and mAP 0.5:0.95 (the average of AP from 50% to 95% IoU with an interval of 5%) of 0.9366. In addition, our experiments show the increase in accuracy of the model by using mosaic augmentation in all backbones used in our experiments. Moreover, we can also detect cattle with partial muzzle images.
CVJan 9, 2025
MHAFF: Multi-Head Attention Feature Fusion of CNN and Transformer for Cattle IdentificationRabin Dulal, Lihong Zheng, Muhammad Ashad Kabir
Convolutional Neural Networks (CNNs) have drawn researchers' attention to identifying cattle using muzzle images. However, CNNs often fail to capture long-range dependencies within the complex patterns of the muzzle. The transformers handle these challenges. This inspired us to fuse the strengths of CNNs and transformers in muzzle-based cattle identification. Addition and concatenation have been the most commonly used techniques for feature fusion. However, addition fails to preserve discriminative information, while concatenation results in an increase in dimensionality. Both methods are simple operations and cannot discover the relationships or interactions between fusing features. This research aims to overcome the issues faced by addition and concatenation. This research introduces a novel approach called Multi-Head Attention Feature Fusion (MHAFF) for the first time in cattle identification. MHAFF captures relations between the different types of fusing features while preserving their originality. The experiments show that MHAFF outperformed addition and concatenation techniques and the existing cattle identification methods in accuracy on two publicly available cattle datasets. MHAFF demonstrates excellent performance and quickly converges to achieve optimum accuracy of 99.88% and 99.52% in two cattle datasets simultaneously.
CVJun 16, 2025
A Comprehensive Survey on Deep Learning Solutions for 3D Flood MappingWenfeng Jia, Bin Liang, Yuxi Liu et al.
Flooding remains a major global challenge, worsened by climate change and urbanization, demanding advanced solutions for effective disaster management. While traditional 2D flood mapping techniques provide limited insights, 3D flood mapping, powered by deep learning (DL), offers enhanced capabilities by integrating flood extent and depth. This paper presents a comprehensive survey of deep learning-based 3D flood mapping, emphasizing its advancements over 2D maps by integrating flood extent and depth for effective disaster management and urban planning. The survey categorizes deep learning techniques into task decomposition and end-to-end approaches, applicable to both static and dynamic flood features. We compare key DL architectures, highlighting their respective roles in enhancing prediction accuracy and computational efficiency. Additionally, this work explores diverse data sources such as digital elevation models, satellite imagery, rainfall, and simulated data, outlining their roles in 3D flood mapping. The applications reviewed range from real-time flood prediction to long-term urban planning and risk assessment. However, significant challenges persist, including data scarcity, model interpretability, and integration with traditional hydrodynamic models. This survey concludes by suggesting future directions to address these limitations, focusing on enhanced datasets, improved models, and policy implications for flood management. This survey aims to guide researchers and practitioners in leveraging DL techniques for more robust and reliable 3D flood mapping, fostering improved flood management strategies.
CVJan 25
Agreement-Driven Multi-View 3D Reconstruction for Live Cattle Weight EstimationRabin Dulal, Wenfeng Jia, Lihong Zheng et al.
Accurate cattle live weight estimation is vital for livestock management, welfare, and productivity. Traditional methods, such as manual weighing using a walk-over weighing system or proximate measurements using body condition scoring, involve manual handling of stock and can impact productivity from both a stock and economic perspective. To address these issues, this study investigated a cost-effective, non-contact method for live weight calculation in cattle using 3D reconstruction. The proposed pipeline utilized multi-view RGB images with SAM 3D-based agreement-guided fusion, followed by ensemble regression. Our approach generates a single 3D point cloud per animal and compares classical ensemble models with deep learning models under low-data conditions. Results show that SAM 3D with multi-view agreement fusion outperforms other 3D generation methods, while classical ensemble models provide the most consistent performance for practical farm scenarios (R$^2$ = 0.69 $\pm$ 0.10, MAPE = 2.22 $\pm$ 0.56 \%), making this practical for on-farm implementation. These findings demonstrate that improving reconstruction quality is more critical than increasing model complexity for scalable deployment on farms where producing a large volume of 3D data is challenging.
CVSep 14, 2025
CCoMAML: Efficient Cattle Identification Using Cooperative Model-Agnostic Meta-LearningRabin Dulal, Lihong Zheng, Ashad Kabir
Cattle identification is critical for efficient livestock farming management, currently reliant on radio-frequency identification (RFID) ear tags. However, RFID-based systems are prone to failure due to loss, damage, tampering, and vulnerability to external attacks. As a robust alternative, biometric identification using cattle muzzle patterns similar to human fingerprints has emerged as a promising solution. Deep learning techniques have demonstrated success in leveraging these unique patterns for accurate identification. But deep learning models face significant challenges, including limited data availability, disruptions during data collection, and dynamic herd compositions that require frequent model retraining. To address these limitations, this paper proposes a novel few-shot learning framework for real-time cattle identification using Cooperative Model-Agnostic Meta-Learning (CCoMAML) with Multi-Head Attention Feature Fusion (MHAFF) as a feature extractor model. This model offers great model adaptability to new data through efficient learning from few data samples without retraining. The proposed approach has been rigorously evaluated against current state-of-the-art few-shot learning techniques applied in cattle identification. Comprehensive experimental results demonstrate that our proposed CCoMAML with MHAFF has superior cattle identification performance with 98.46% and 97.91% F1 scores.
CVSep 8, 2025
When Language Model Guides Vision: Grounding DINO for Cattle Muzzle DetectionRabin Dulal, Lihong Zheng, Muhammad Ashad Kabir
Muzzle patterns are among the most effective biometric traits for cattle identification. Fast and accurate detection of the muzzle region as the region of interest is critical to automatic visual cattle identification.. Earlier approaches relied on manual detection, which is labor-intensive and inconsistent. Recently, automated methods using supervised models like YOLO have become popular for muzzle detection. Although effective, these methods require extensive annotated datasets and tend to be trained data-dependent, limiting their performance on new or unseen cattle. To address these limitations, this study proposes a zero-shot muzzle detection framework based on Grounding DINO, a vision-language model capable of detecting muzzles without any task-specific training or annotated data. This approach leverages natural language prompts to guide detection, enabling scalable and flexible muzzle localization across diverse breeds and environments. Our model achieves a mean Average Precision (mAP)@0.5 of 76.8\%, demonstrating promising performance without requiring annotated data. To our knowledge, this is the first research to provide a real-world, industry-oriented, and annotation-free solution for cattle muzzle detection. The framework offers a practical alternative to supervised methods, promising improved adaptability and ease of deployment in livestock monitoring applications.
CVJul 10, 2025
HOTA: Hierarchical Overlap-Tiling Aggregation for Large-Area 3D Flood MappingWenfeng Jia, Bin Liang, Yuxi Lu et al.
Floods are among the most frequent natural hazards and cause significant social and economic damage. Timely, large-scale information on flood extent and depth is essential for disaster response; however, existing products often trade spatial detail for coverage or ignore flood depth altogether. To bridge this gap, this work presents HOTA: Hierarchical Overlap-Tiling Aggregation, a plug-and-play, multi-scale inference strategy. When combined with SegFormer and a dual-constraint depth estimation module, this approach forms a complete 3D flood-mapping pipeline. HOTA applies overlapping tiles of different sizes to multispectral Sentinel-2 images only during inference, enabling the SegFormer model to capture both local features and kilometre-scale inundation without changing the network weights or retraining. The subsequent depth module is based on a digital elevation model (DEM) differencing method, which refines the 2D mask and estimates flood depth by enforcing (i) zero depth along the flood boundary and (ii) near-constant flood volume with respect to the DEM. A case study on the March 2021 Kempsey (Australia) flood shows that HOTA, when coupled with SegFormer, improves IoU from 73\% (U-Net baseline) to 84\%. The resulting 3D surface achieves a mean absolute boundary error of less than 0.5 m. These results demonstrate that HOTA can produce accurate, large-area 3D flood maps suitable for rapid disaster response.
CVDec 21, 2020
Leaf Segmentation and Counting with Deep Learning: on Model Certainty, Test-Time Augmentation, Trade-OffsDouglas Pinto Sampaio Gomes, Lihong Zheng
Plant phenotyping tasks such as leaf segmentation and counting are fundamental to the study of phenotypic traits. Since it is well-suited for these tasks, deep supervised learning has been prevalent in recent works proposing better performing models at segmenting and counting leaves. Despite good efforts from research groups, one of the main challenges for proposing better methods is still the limitation of labelled data availability. The main efforts of the field seem to be augmenting existing limited data sets, and some aspects of the modelling process have been under-discussed. This paper explores such topics and present experiments that led to the development of the best-performing method in the Leaf Segmentation Challenge and in another external data set of Komatsuna plants. The model has competitive performance while been arguably simpler than other recently proposed ones. The experiments also brought insights such as the fact that model cardinality and test-time augmentation may have strong applications in object segmentation of single class and high occlusion, and regarding the data distribution of recently proposed data sets for benchmarking.
TOJul 10, 2020
Hyperspectral Imaging to detect Age, Defects and Individual Nutrient Deficiency in Grapevine LeavesManoranjan Paul, Sourabhi Debnath, Tanmoy Debnath et al.
Hyperspectral (HS) imaging was successfully employed in the 380 nm to 1000 nm wavelength range to investigate the efficacy of detecting age, healthiness and individual nutrient deficiency of grapevine leaves collected from vineyards located in central west NSW, Australia. For age detection, the appearance of many healthy grapevine leaves has been examined. Then visually defective leaves were compared with healthy leaves. Control leaves and individual nutrient-deficient leaves (e.g. N, K and Mg) were also analysed. Several features were employed at various stages in the Ultraviolet (UV), Visible (VIS) and Near Infrared (NIR) regions to evaluate the experimental data: mean brightness, mean 1st derivative brightness, variation index, mean spectral ratio, normalised difference vegetation index (NDVI) and standard deviation (SD). Experiment results demonstrate that these features could be utilised with a high degree of effectiveness to compare age, identify unhealthy samples and not only to distinguish from control and nutrient deficiency but also to identify individual nutrient defects. Therefore, our work corroborated that HS imaging has excellent potential as a non-destructive as well as a non-contact method to detect age, healthiness and individual nutrient deficiencies of grapevine leaves