Ajay Sharda

CV
h-index19
13papers
203citations
Novelty35%
AI Score49

13 Papers

CVAug 10, 2023
Aphid Cluster Recognition and Detection in the Wild Using Deep Learning Models

Tianxiao Zhang, Kaidong Li, Xiangyu Chen et al.

Aphid infestation poses a significant threat to crop production, rural communities, and global food security. While chemical pest control is crucial for maximizing yields, applying chemicals across entire fields is both environmentally unsustainable and costly. Hence, precise localization and management of aphids are essential for targeted pesticide application. The paper primarily focuses on using deep learning models for detecting aphid clusters. We propose a novel approach for estimating infection levels by detecting aphid clusters. To facilitate this research, we have captured a large-scale dataset from sorghum fields, manually selected 5,447 images containing aphids, and annotated each individual aphid cluster within these images. To facilitate the use of machine learning models, we further process the images by cropping them into patches, resulting in a labeled dataset comprising 151,380 image patches. Then, we implemented and compared the performance of four state-of-the-art object detection models (VFNet, GFLV2, PAA, and ATSS) on the aphid dataset. Extensive experimental results show that all models yield stable similar performance in terms of average precision and recall. We then propose to merge close neighboring clusters and remove tiny clusters caused by cropping, and the performance is further boosted by around 17%. The study demonstrates the feasibility of automatically detecting and managing insects using machine learning models. The labeled dataset will be made openly available to the research community.

CVJul 17, 2023
On the Real-Time Semantic Segmentation of Aphid Clusters in the Wild

Raiyan Rahman, Christopher Indris, Tianxiao Zhang et al.

Aphid infestations can cause extensive damage to wheat and sorghum fields and spread plant viruses, resulting in significant yield losses in agriculture. To address this issue, farmers often rely on chemical pesticides, which are inefficiently applied over large areas of fields. As a result, a considerable amount of pesticide is wasted on areas without pests, while inadequate amounts are applied to areas with severe infestations. The paper focuses on the urgent need for an intelligent autonomous system that can locate and spray infestations within complex crop canopies, reducing pesticide use and environmental impact. We have collected and labeled a large aphid image dataset in the field, and propose the use of real-time semantic segmentation models to segment clusters of aphids. A multiscale dataset is generated to allow for learning the clusters at different scales. We compare the segmentation speeds and accuracy of four state-of-the-art real-time semantic segmentation models on the aphid cluster dataset, benchmarking them against nonreal-time models. The study results show the effectiveness of a real-time solution, which can reduce inefficient pesticide use and increase crop yields, paving the way towards an autonomous pest detection system.

CVJul 12, 2023
A New Dataset and Comparative Study for Aphid Cluster Detection

Tianxiao Zhang, Kaidong Li, Xiangyu Chen et al.

Aphids are one of the main threats to crops, rural families, and global food security. Chemical pest control is a necessary component of crop production for maximizing yields, however, it is unnecessary to apply the chemical approaches to the entire fields in consideration of the environmental pollution and the cost. Thus, accurately localizing the aphid and estimating the infestation level is crucial to the precise local application of pesticides. Aphid detection is very challenging as each individual aphid is really small and all aphids are crowded together as clusters. In this paper, we propose to estimate the infection level by detecting aphid clusters. We have taken millions of images in the sorghum fields, manually selected 5,447 images that contain aphids, and annotated each aphid cluster in the image. To use these images for machine learning models, we crop the images into patches and created a labeled dataset with over 151,000 image patches. Then, we implement and compare the performance of four state-of-the-art object detection models.

CVApr 28
FruitProM-V2: Robust Probabilistic Maturity Estimation and Detection of Fruits and Vegetables

Rahul Harsha Cheppally, Sidharth Rai, Sudan Baral et al.

Accurate fruit maturity identification is essential for determining harvest timing, as incorrect assessment directly affects yield and post-harvest quality. Although ripening is a continuous biological process, vision-based maturity estimation is typically formulated as a multi-class classification task, which imposes sharp boundaries between visually similar stages. To examine this limitation, we perform an annotation reliability study with two independent annotators on a held-out tomato dataset and observe disagreement concentrated near adjacent maturity stages. Motivated by this observation, we model maturity as a latent continuous variable and predict it probabilistically using a distributional detection head, converting the distribution into class probabilities through the cumulative distribution function (CDF). The proposed formulation maintains comparable performance to a standard detector under clean labels while better representing uncertainty. Furthermore, when controlled label noise is introduced during training, the probabilistic model demonstrates improved robustness relative to the baseline, indicating that explicitly modeling maturity uncertainty leads to more reliable visual maturity estimation.

CVApr 17, 2025
RF-DETR Object Detection vs YOLOv12 : A Study of Transformer-based and CNN-based Architectures for Single-Class and Multi-Class Greenfruit Detection in Complex Orchard Environments Under Label Ambiguity

Ranjan Sapkota, Rahul Harsha Cheppally, Ajay Sharda et al.

This study conducts a detailed comparison of RF-DETR object detection base model and YOLOv12 object detection model configurations for detecting greenfruits in a complex orchard environment marked by label ambiguity, occlusions, and background blending. A custom dataset was developed featuring both single-class (greenfruit) and multi-class (occluded and non-occluded greenfruits) annotations to assess model performance under dynamic real-world conditions. RF-DETR object detection model, utilizing a DINOv2 backbone and deformable attention, excelled in global context modeling, effectively identifying partially occluded or ambiguous greenfruits. In contrast, YOLOv12 leveraged CNN-based attention for enhanced local feature extraction, optimizing it for computational efficiency and edge deployment. RF-DETR achieved the highest mean Average Precision (mAP50) of 0.9464 in single-class detection, proving its superior ability to localize greenfruits in cluttered scenes. Although YOLOv12N recorded the highest mAP@50:95 of 0.7620, RF-DETR consistently outperformed in complex spatial scenarios. For multi-class detection, RF-DETR led with an mAP@50 of 0.8298, showing its capability to differentiate between occluded and non-occluded fruits, while YOLOv12L scored highest in mAP@50:95 with 0.6622, indicating better classification in detailed occlusion contexts. Training dynamics analysis highlighted RF-DETR's swift convergence, particularly in single-class settings where it plateaued within 10 epochs, demonstrating the efficiency of transformer-based architectures in adapting to dynamic visual data. These findings validate RF-DETR's effectiveness for precision agricultural applications, with YOLOv12 suited for fast-response scenarios. >Index Terms: RF-DETR object detection, YOLOv12, YOLOv13, YOLOv14, YOLOv15, YOLOE, YOLO World, YOLO, You Only Look Once, Roboflow, Detection Transformers, CNNs

CVMay 7, 2024
A New Dataset and Comparative Study for Aphid Cluster Detection and Segmentation in Sorghum Fields

Raiyan Rahman, Christopher Indris, Goetz Bramesfeld et al.

Aphid infestations are one of the primary causes of extensive damage to wheat and sorghum fields and are one of the most common vectors for plant viruses, resulting in significant agricultural yield losses. To address this problem, farmers often employ the inefficient use of harmful chemical pesticides that have negative health and environmental impacts. As a result, a large amount of pesticide is wasted on areas without significant pest infestation. This brings to attention the urgent need for an intelligent autonomous system that can locate and spray sufficiently large infestations selectively within the complex crop canopies. We have developed a large multi-scale dataset for aphid cluster detection and segmentation, collected from actual sorghum fields and meticulously annotated to include clusters of aphids. Our dataset comprises a total of 54,742 image patches, showcasing a variety of viewpoints, diverse lighting conditions, and multiple scales, highlighting its effectiveness for real-world applications. In this study, we trained and evaluated four real-time semantic segmentation models and three object detection models specifically for aphid cluster segmentation and detection. Considering the balance between accuracy and efficiency, Fast-SCNN delivered the most effective segmentation results, achieving 80.46% mean precision, 81.21% mean recall, and 91.66 frames per second (FPS). For object detection, RT-DETR exhibited the best overall performance with a 61.63% mean average precision (mAP), 92.6% mean recall, and 72.55 on an NVIDIA V100 GPU. Our experiments further indicate that aphid cluster segmentation is more suitable for assessing aphid infestations than using detection models.

CVSep 29, 2025
YOLO26: Key Architectural Enhancements and Performance Benchmarking for Real-Time Object Detection

Ranjan Sapkota, Rahul Harsha Cheppally, Ajay Sharda et al.

This study presents a comprehensive analysis of Ultralytics YOLO26, highlighting its key architectural enhancements and performance benchmarking for real-time object detection. YOLO26, released in September 2025, stands as the newest and most advanced member of the YOLO family, purpose-built to deliver efficiency, accuracy, and deployment readiness on edge and low-power devices. The paper sequentially details architectural innovations of YOLO26, including the removal of Distribution Focal Loss (DFL), adoption of end-to-end NMS-free inference, integration of ProgLoss and Small-Target-Aware Label Assignment (STAL), and the introduction of the MuSGD optimizer for stable convergence. Beyond architecture, the study positions YOLO26 as a multi-task framework, supporting object detection, instance segmentation, pose/keypoints estimation, oriented detection, and classification. We present performance benchmarks of YOLO26 on edge devices such as NVIDIA Jetson Nano and Orin, comparing its results with YOLOv8, YOLOv11, YOLOv12, YOLOv13, and transformer-based detectors(RF-DETR and RT-DETR). This paper further explores real-time deployment pathways, flexible export options (ONNX, TensorRT, CoreML, TFLite), and quantization for INT8/FP16. Practical use cases of YOLO26 across robotics, manufacturing, and IoT are highlighted to demonstrate cross-industry adaptability. Finally, insights on deployment efficiency and broader implications are discussed, with future directions for YOLO26 and the YOLO lineage outlined.

CVSep 15, 2025
Axis-Aligned 3D Stalk Diameter Estimation from RGB-D Imagery

Benjamin Vail, Rahul Harsha Cheppally, Ajay Sharda et al.

Accurate, high-throughput phenotyping is a critical component of modern crop breeding programs, especially for improving traits such as mechanical stability, biomass production, and disease resistance. Stalk diameter is a key structural trait, but traditional measurement methods are labor-intensive, error-prone, and unsuitable for scalable phenotyping. In this paper, we present a geometry-aware computer vision pipeline for estimating stalk diameter from RGB-D imagery. Our method integrates deep learning-based instance segmentation, 3D point cloud reconstruction, and axis-aligned slicing via Principal Component Analysis (PCA) to perform robust diameter estimation. By mitigating the effects of curvature, occlusion, and image noise, this approach offers a scalable and reliable solution to support high-throughput phenotyping in breeding and agronomic research.

CVJun 24, 2025
Computer Vision based Automated Quantification of Agricultural Sprayers Boom Displacement

Aryan Singh Dalal, Sidharth Rai, Rahul Singh et al.

Application rate errors when using self-propelled agricultural sprayers for agricultural production remain a concern. Among other factors, spray boom instability is one of the major contributors to application errors. Spray booms' width of 38m, combined with 30 kph driving speeds, varying terrain, and machine dynamics when maneuvering complex field boundaries, make controls of these booms very complex. However, there is no quantitative knowledge on the extent of boom movement to systematically develop a solution that might include boom designs and responsive boom control systems. Therefore, this study was conducted to develop an automated computer vision system to quantify the boom movement of various agricultural sprayers. A computer vision system was developed to track a target on the edge of the sprayer boom in real time. YOLO V7, V8, and V11 neural network models were trained to track the boom's movements in field operations to quantify effective displacement in the vertical and transverse directions. An inclinometer sensor was mounted on the boom to capture boom angles and validate the neural network model output. The results showed that the model could detect the target with more than 90 percent accuracy, and distance estimates of the target on the boom were within 0.026 m of the inclinometer sensor data. This system can quantify the boom movement on the current sprayer and potentially on any other sprayer with minor modifications. The data can be used to make design improvements to make sprayer booms more stable and achieve greater application accuracy.

CVApr 27, 2025
Enhancing seeding efficiency using a computer vision system to monitor furrow quality in real-time

Sidharth Rai, Aryan Dalal, Riley Slichter et al.

Effective seed sowing in precision agriculture is hindered by challenges such as residue accumulation, low soil temperatures, and hair pinning (crop residue pushed in the trench by furrow opener), which obstruct optimal trench formation. Row cleaners are employed to mitigate these issues, but there is a lack of quantitative methods to assess trench cleanliness. In this study, a novel computer vision-based method was developed to evaluate row cleaner performance. Multiple air seeders were equipped with a video acquisition system to capture trench conditions after row cleaner operation, enabling an effective comparison of the performance of each row cleaner. The captured data were used to develop a segmentation model that analyzed key elements such as soil, straw, and machinery. Using the results from the segmentation model, an objective method was developed to quantify row cleaner performance. The results demonstrated the potential of this method to improve row cleaner selection and enhance seeding efficiency in precision agriculture.

CVDec 13, 2024
RowDetr: End-to-End Crop Row Detection Using Polynomials

Rahul Harsha Cheppally, Ajay Sharda

Crop row detection enables autonomous robots to navigate in gps denied environments. Vision based strategies often struggle in the environments due to gaps, curved crop rows and require post-processing steps. Furthermore, labeling crop rows in under the canopy environments accurately is very difficult due to occlusions. This study introduces RowDetr, an efficient end-to-end transformer-based neural network for crop row detection in precision agriculture. RowDetr leverages a lightweight backbone and a hybrid encoder to model straight, curved, or occluded crop rows with high precision. Central to the architecture is a novel polynomial representation that enables direct parameterization of crop rows, eliminating computationally expensive post-processing. Key innovations include a PolySampler module and multi-scale deformable attention, which work together with PolyOptLoss, an energy-based loss function designed to optimize geometric alignment between predicted and the annotated crop rows, while also enhancing robustness against labeling noise. RowDetr was evaluated against other state-of-the-art end-to-end crop row detection methods like AgroNav and RolColAttention on a diverse dataset of 6,962 high-resolution images, used for training, validation, and testing across multiple crop types with annotated crop rows. The system demonstrated superior performance, achieved an F1 score up to 0.74 and a lane position deviation as low as 0.405. Furthermore, RowDetr achieves a real-time inference latency of 6.7ms, which was optimized to 3.5ms with INT8 quantization on an NVIDIA Jetson Orin AGX. This work highlighted the critical efficiency of polynomial parameterization, making RowDetr particularly suitable for deployment on edge computing devices in agricultural robotics and autonomous farming equipment. Index terms > Crop Row Detection, Under Canopy Navigation, Transformers, RT-DETR, RT-DETRv2

CVOct 28, 2025
FruitProm: Probabilistic Maturity Estimation and Detection of Fruits and Vegetables

Sidharth Rai, Rahul Harsha Cheppally, Benjamin Vail et al.

Maturity estimation of fruits and vegetables is a critical task for agricultural automation, directly impacting yield prediction and robotic harvesting. Current deep learning approaches predominantly treat maturity as a discrete classification problem (e.g., unripe, ripe, overripe). This rigid formulation, however, fundamentally conflicts with the continuous nature of the biological ripening process, leading to information loss and ambiguous class boundaries. In this paper, we challenge this paradigm by reframing maturity estimation as a continuous, probabilistic learning task. We propose a novel architectural modification to the state-of-the-art, real-time object detector, RT-DETRv2, by introducing a dedicated probabilistic head. This head enables the model to predict a continuous distribution over the maturity spectrum for each detected object, simultaneously learning the mean maturity state and its associated uncertainty. This uncertainty measure is crucial for downstream decision-making in robotics, providing a confidence score for tasks like selective harvesting. Our model not only provides a far richer and more biologically plausible representation of plant maturity but also maintains exceptional detection performance, achieving a mean Average Precision (mAP) of 85.6\% on a challenging, large-scale fruit dataset. We demonstrate through extensive experiments that our probabilistic approach offers more granular and accurate maturity assessments than its classification-based counterparts, paving the way for more intelligent, uncertainty-aware automated systems in modern agriculture

CVJan 23, 2022
Dynamic Label Assignment for Object Detection by Combining Predicted IoUs and Anchor IoUs

Tianxiao Zhang, Bo Luo, Ajay Sharda et al.

Label assignment plays a significant role in modern object detection models. Detection models may yield totally different performances with different label assignment strategies. For anchor-based detection models, the IoU (Intersection over Union) threshold between the anchors and their corresponding ground truth bounding boxes is the key element since the positive samples and negative samples are divided by the IoU threshold. Early object detectors simply utilize the fixed threshold for all training samples, while recent detection algorithms focus on adaptive thresholds based on the distribution of the IoUs to the ground truth boxes. In this paper, we introduce a simple while effective approach to perform label assignment dynamically based on the training status with predictions. By introducing the predictions in label assignment, more high-quality samples with higher IoUs to the ground truth objects are selected as the positive samples, which could reduce the discrepancy between the classification scores and the IoU scores, and generate more high-quality boundary boxes. Our approach shows improvements in the performance of the detection models with the adaptive label assignment algorithm and lower bounding box losses for those positive samples, indicating more samples with higher-quality predicted boxes are selected as positives.