CVOct 11, 2025
YOLOv11-Litchi: Efficient Litchi Fruit Detection based on UAV-Captured Agricultural Imagery in Complex Orchard EnvironmentsHongxing Peng, Haopei Xie, Weijia Lia et al.
Litchi is a high-value fruit, yet traditional manual selection methods are increasingly inadequate for modern production demands. Integrating UAV-based aerial imagery with deep learning offers a promising solution to enhance efficiency and reduce costs. This paper introduces YOLOv11-Litchi, a lightweight and robust detection model specifically designed for UAV-based litchi detection. Built upon the YOLOv11 framework, the proposed model addresses key challenges such as small target size, large model parameters hindering deployment, and frequent target occlusion. To tackle these issues, three major innovations are incorporated: a multi-scale residual module to improve contextual feature extraction across scales, a lightweight feature fusion method to reduce model size and computational costs while maintaining high accuracy, and a litchi occlusion detection head to mitigate occlusion effects by emphasizing target regions and suppressing background interference. Experimental results validate the model's effectiveness. YOLOv11-Litchi achieves a parameter size of 6.35 MB - 32.5% smaller than the YOLOv11 baseline - while improving mAP by 2.5% to 90.1% and F1-Score by 1.4% to 85.5%. Additionally, the model achieves a frame rate of 57.2 FPS, meeting real-time detection requirements. These findings demonstrate the suitability of YOLOv11-Litchi for UAV-based litchi detection in complex orchard environments, showcasing its potential for broader applications in precision agriculture.
CVJul 1, 2025
High-Frequency Semantics and Geometric Priors for End-to-End Detection Transformers in Challenging UAV ImageryHongxing Peng, Lide Chen, Hui Zhu et al.
Object detection in Unmanned Aerial Vehicle (UAV) imagery is fundamentally challenged by a prevalence of small, densely packed, and occluded objects within cluttered backgrounds. Conventional detectors struggle with this domain, as they rely on hand-crafted components like pre-defined anchors and heuristic-based Non-Maximum Suppression (NMS), creating a well-known performance bottleneck in dense scenes. Even recent end-to-end frameworks have not been purpose-built to overcome these specific aerial challenges, resulting in a persistent performance gap. To bridge this gap, we introduce HEDS-DETR, a holistically enhanced real-time Detection Transformer tailored for aerial scenes. Our framework features three key innovations. First, we propose a novel High-Frequency Enhanced Semantics Network (HFESNet) backbone, which yields highly discriminative features by preserving critical high-frequency details alongside robust semantic context. Second, our Efficient Small Object Pyramid (ESOP) counteracts information loss by efficiently fusing high-resolution features, significantly boosting small object detection. Finally, we enhance decoder stability and localization precision with two synergistic components: Selective Query Recollection (SQR) and Geometry-Aware Positional Encoding (GAPE), which stabilize optimization and provide explicit spatial priors for dense object arrangements. On the VisDrone dataset, HEDS-DETR achieves a +3.8% AP and +5.1% AP50 gain over its baseline while reducing parameters by 4M and maintaining real-time speeds. This demonstrates a highly competitive accuracy-efficiency balance, especially for detecting dense and small objects in aerial scenes.
CVApr 22, 2025
HS-Mamba: Full-Field Interaction Multi-Groups Mamba for Hyperspectral Image ClassificationHongxing Peng, Kang Lin, Huanai Liu
Hyperspectral image (HSI) classification has been one of the hot topics in remote sensing fields. Recently, the Mamba architecture based on selective state-space models (S6) has demonstrated great advantages in long sequence modeling. However, the unique properties of hyperspectral data, such as high dimensionality and feature inlining, pose challenges to the application of Mamba to HSI classification. To compensate for these shortcomings, we propose an full-field interaction multi-groups Mamba framework (HS-Mamba), which adopts a strategy different from pixel-patch based or whole-image based, but combines the advantages of both. The patches cut from the whole image are sent to multi-groups Mamba, combined with positional information to perceive local inline features in the spatial and spectral domains, and the whole image is sent to a lightweight attention module to enhance the global feature representation ability. Specifically, HS-Mamba consists of a dual-channel spatial-spectral encoder (DCSS-encoder) module and a lightweight global inline attention (LGI-Att) branch. The DCSS-encoder module uses multiple groups of Mamba to decouple and model the local features of dual-channel sequences with non-overlapping patches. The LGI-Att branch uses a lightweight compressed and extended attention module to perceive the global features of the spatial and spectral domains of the unsegmented whole image. By fusing local and global features, high-precision classification of hyperspectral images is achieved. Extensive experiments demonstrate the superiority of the proposed HS-Mamba, outperforming state-of-the-art methods on four benchmark HSI datasets.
LGOct 30, 2018
Enhanced Ensemble Clustering via Fast Propagation of Cluster-wise SimilaritiesDong Huang, Chang-Dong Wang, Hongxing Peng et al.
Ensemble clustering has been a popular research topic in data mining and machine learning. Despite its significant progress in recent years, there are still two challenging issues in the current ensemble clustering research. First, most of the existing algorithms tend to investigate the ensemble information at the object-level, yet often lack the ability to explore the rich information at higher levels of granularity. Second, they mostly focus on the direct connections (e.g., direct intersection or pair-wise co-occurrence) in the multiple base clusterings, but generally neglect the multi-scale indirect relationship hidden in them. To address these two issues, this paper presents a novel ensemble clustering approach based on fast propagation of cluster-wise similarities via random walks. We first construct a cluster similarity graph with the base clusters treated as graph nodes and the cluster-wise Jaccard coefficient exploited to compute the initial edge weights. Upon the constructed graph, a transition probability matrix is defined, based on which the random walk process is conducted to propagate the graph structural information. Specifically, by investigating the propagating trajectories starting from different nodes, a new cluster-wise similarity matrix can be derived by considering the trajectory relationship. Then, the newly obtained cluster-wise similarity matrix is mapped from the cluster-level to the object-level to achieve an enhanced co-association (ECA) matrix, which is able to simultaneously capture the object-wise co-occurrence relationship as well as the multi-scale cluster-wise relationship in ensembles. Finally, two novel consensus functions are proposed to obtain the consensus clustering result. Extensive experiments on a variety of real-world datasets have demonstrated the effectiveness and efficiency of our approach.