Jianguo Yao

CV
h-index20
5papers
14citations
Novelty58%
AI Score38

5 Papers

CVOct 19, 2022
HAVANA: Hard negAtiVe sAmples aware self-supervised coNtrastive leArning for Airborne laser scanning point clouds semantic segmentation

Yunsheng Zhang, Jianguo Yao, Ruixiang Zhang et al.

Deep Neural Network (DNN) based point cloud semantic segmentation has presented significant achievements on large-scale labeled aerial laser point cloud datasets. However, annotating such large-scaled point clouds is time-consuming. Due to density variations and spatial heterogeneity of the Airborne Laser Scanning (ALS) point clouds, DNNs lack generalization capability and thus lead to unpromising semantic segmentation, as the DNN trained in one region underperform when directly utilized in other regions. However, Self-Supervised Learning (SSL) is a promising way to solve this problem by pre-training a DNN model utilizing unlabeled samples followed by a fine-tuned downstream task involving very limited labels. Hence, this work proposes a hard-negative sample aware self-supervised contrastive learning method to pre-train the model for semantic segmentation. The traditional contrastive learning for point clouds selects the hardest negative samples by solely relying on the distance between the embedded features derived from the learning process, potentially evolving some negative samples from the same classes to reduce the contrastive learning effectiveness. Therefore, we design an AbsPAN (Absolute Positive And Negative samples) strategy based on k-means clustering to filter the possible false-negative samples. Experiments on two typical ALS benchmark datasets demonstrate that the proposed method is more appealing than supervised training schemes without pre-training. Especially when the labels are severely inadequate (10% of the ISPRS training set), the results obtained by the proposed HAVANA method still exceed 94% of the supervised paradigm performance with full training set.

LGMar 13, 2025
Samoyeds: Accelerating MoE Models with Structured Sparsity Leveraging Sparse Tensor Cores

Chenpeng Wu, Qiqi Gu, Heng Shi et al.

The escalating size of Mixture-of-Experts (MoE) based Large Language Models (LLMs) presents significant computational and memory challenges, necessitating innovative solutions to enhance efficiency without compromising model accuracy. Structured sparsity emerges as a compelling strategy to address these challenges by leveraging the emerging sparse computing hardware. Prior works mainly focus on the sparsity in model parameters, neglecting the inherent sparse patterns in activations. This oversight can lead to additional computational costs associated with activations, potentially resulting in suboptimal performance. This paper presents Samoyeds, an innovative acceleration system for MoE LLMs utilizing Sparse Tensor Cores (SpTCs). Samoyeds is the first to apply sparsity simultaneously to both activations and model parameters. It introduces a bespoke sparse data format tailored for MoE computation and develops a specialized sparse-sparse matrix multiplication kernel. Furthermore, Samoyeds incorporates systematic optimizations specifically designed for the execution of dual-side structured sparse MoE LLMs on SpTCs, further enhancing system performance. Evaluations show that Samoyeds outperforms SOTA works by up to 1.99$\times$ at the kernel level and 1.58$\times$ at the model level. Moreover, it enhances memory efficiency, increasing maximum supported batch sizes by 4.41$\times$ on average. Additionally, Samoyeds surpasses existing SOTA structured sparse solutions in both model accuracy and hardware portability.

LGMay 22, 2025
STRCMP: Integrating Graph Structural Priors with Language Models for Combinatorial Optimization

Xijun Li, Jiexiang Yang, Jinghao Wang et al.

Combinatorial optimization (CO) problems, central to operation research and theoretical computer science, present significant computational challenges due to their NP-hard nature. While large language models (LLMs) have emerged as promising tools for CO--either by directly generating solutions or synthesizing solver-specific codes--existing approaches often neglect critical structural priors inherent to CO problems, leading to suboptimality and iterative inefficiency. Inspired by human experts' success in leveraging CO structures for algorithm design, we propose STRCMP, a novel structure-aware LLM-based algorithm discovery framework that systematically integrates structure priors to enhance solution quality and solving efficiency. Our framework combines a graph neural network (GNN) for extracting structural embeddings from CO instances with an LLM conditioned on these embeddings to identify high-performing algorithms in the form of solver-specific codes. This composite architecture ensures syntactic correctness, preserves problem topology, and aligns with natural language objectives, while an evolutionary refinement process iteratively optimizes generated algorithm. Extensive evaluations across Mixed Integer Linear Programming and Boolean Satisfiability problems, using nine benchmark datasets, demonstrate that our proposed STRCMP outperforms five strong neural and LLM-based methods by a large margin, in terms of both solution optimality and computational efficiency. The code and learned model will be publicly available upon the acceptance of the paper.

CVOct 11, 2025
A Multi-Strategy Framework for Enhancing Shatian Pomelo Detection in Real-World Orchards

Pan Wang, Yihao Hu, Xiaodong Bai et al.

As a specialty agricultural product with a large market scale, Shatian pomelo necessitates the adoption of automated detection to ensure accurate quantity and meet commercial demands for lean production. Existing research often involves specialized networks tailored for specific theoretical or dataset scenarios, but these methods tend to degrade performance in real-world. Through analysis of factors in this issue, this study identifies four key challenges that affect the accuracy of Shatian pomelo detection: imaging devices, lighting conditions, object scale variation, and occlusion. To mitigate these challenges, a multi-strategy framework is proposed in this paper. Firstly, to effectively solve tone variation introduced by diverse imaging devices and complex orchard environments, we utilize a multi-scenario dataset, STP-AgriData, which is constructed by integrating real orchard images with internet-sourced data. Secondly, to simulate the inconsistent illumination conditions, specific data augmentations such as adjusting contrast and changing brightness, are applied to the above dataset. Thirdly, to address the issues of object scale variation and occlusion in fruit detection, an REAS-Det network is designed in this paper. For scale variation, RFAConv and C3RFEM modules are designed to expand and enhance the receptive fields. For occlusion variation, a multi-scale, multi-head feature selection structure (MultiSEAM) and soft-NMS are introduced to enhance the handling of occlusion issues to improve detection accuracy. The results of these experiments achieved a precision(P) of 87.6%, a recall (R) of 74.9%, a mAP@.50 of 82.8%, and a mAP@.50:.95 of 53.3%. Our proposed network demonstrates superior performance compared to other state-of-the-art detection methods.

CVSep 24, 2025
SDE-DET: A Precision Network for Shatian Pomelo Detection in Complex Orchard Environments

Yihao Hu, Pan Wang, Xiaodong Bai et al.

Pomelo detection is an essential process for their localization, automated robotic harvesting, and maturity analysis. However, detecting Shatian pomelo in complex orchard environments poses significant challenges, including multi-scale issues, obstructions from trunks and leaves, small object detection, etc. To address these issues, this study constructs a custom dataset STP-AgriData and proposes the SDE-DET model for Shatian pomelo detection. SDE-DET first utilizes the Star Block to effectively acquire high-dimensional information without increasing the computational overhead. Furthermore, the presented model adopts Deformable Attention in its backbone, to enhance its ability to detect pomelos under occluded conditions. Finally, multiple Efficient Multi-Scale Attention mechanisms are integrated into our model to reduce the computational overhead and extract deep visual representations, thereby improving the capacity for small object detection. In the experiment, we compared SDE-DET with the Yolo series and other mainstream detection models in Shatian pomelo detection. The presented SDE-DET model achieved scores of 0.883, 0.771, 0.838, 0.497, and 0.823 in Precision, Recall, mAP@0.5, mAP@0.5:0.95 and F1-score, respectively. SDE-DET has achieved state-of-the-art performance on the STP-AgriData dataset. Experiments indicate that the SDE-DET provides a reliable method for Shatian pomelo detection, laying the foundation for the further development of automatic harvest robots.