Yuhan Gao

CV
h-index5
7papers
64citations
Novelty42%
AI Score54

7 Papers

SEMar 17Code
InCoder-32B: Code Foundation Model for Industrial Scenarios

Jian Yang, Wei Zhang, Jiajun Wu et al.

Recent code large language models have achieved remarkable progress on general programming tasks. Nevertheless, their performance degrades significantly in industrial scenarios that require reasoning about hardware semantics, specialized language constructs, and strict resource constraints. To address these challenges, we introduce InCoder-32B (Industrial-Coder-32B), the first 32B-parameter code foundation model unifying code intelligence across chip design, GPU kernel optimization, embedded systems, compiler optimization, and 3D modeling. By adopting an efficient architecture, we train InCoder-32B from scratch with general code pre-training, curated industrial code annealing, mid-training that progressively extends context from 8K to 128K tokens with synthetic industrial reasoning data, and post-training with execution-grounded verification. We conduct extensive evaluation on 14 mainstream general code benchmarks and 9 industrial benchmarks spanning 4 specialized domains. Results show InCoder-32B achieves highly competitive performance on general tasks while establishing strong open-source baselines across industrial domains.

CVNov 10, 2022
Learning Cross-view Geo-localization Embeddings via Dynamic Weighted Decorrelation Regularization

Tingyu Wang, Zhedong Zheng, Zunjie Zhu et al.

Cross-view geo-localization aims to spot images of the same location shot from two platforms, e.g., the drone platform and the satellite platform. Existing methods usually focus on optimizing the distance between one embedding with others in the feature space, while neglecting the redundancy of the embedding itself. In this paper, we argue that the low redundancy is also of importance, which motivates the model to mine more diverse patterns. To verify this point, we introduce a simple yet effective regularization, i.e., Dynamic Weighted Decorrelation Regularization (DWDR), to explicitly encourage networks to learn independent embedding channels. As the name implies, DWDR regresses the embedding correlation coefficient matrix to a sparse matrix, i.e., the identity matrix, with dynamic weights. The dynamic weights are applied to focus on still correlated channels during training. Besides, we propose a cross-view symmetric sampling strategy, which keeps the example balance between different platforms. Albeit simple, the proposed method has achieved competitive results on three large-scale benchmarks, i.e., University-1652, CVUSA and CVACT. Moreover, under the harsh circumstance, e.g., the extremely short feature of 64 dimensions, the proposed method surpasses the baseline model by a clear margin.

CVApr 18
CATP: Confidence-Aware Token Pruning for Camouflaged Object Detection

Yuhan Gao, Shuhao Kang, Xin He et al.

Camouflaged Object Detection (COD) aims to segment targets that share extreme textural and structural similarities with their complex environments. Leveraging their capacity for long-range dependency modeling, Transformer-based detectors have become the mainstream approach and achieve state-of-the-art (SoTA) accuracy, yet their substantial computational overhead severely limits practical deployment. To address this, we propose a hierarchical Confidence-Aware Token Pruning framework (CATP) tailored for COD. Our approach hierarchically identifies and discards easily distinguishable tokens from both background and object interiors, focusing computations on critical boundary tokens. To compensate for information loss from pruning, we introduce a dual-path feature compensation mechanism that aggregates contextual knowledge from pruned tokens into enriched features. Extensive experiments on multiple COD benchmarks demonstrate that our method significantly reduces computational complexity while maintaining high accuracy, offering a promising research direction for the efficient deployment of COD models in real-world scenarios. The code will be released.

CVMay 12
Cluster-Aware Neural Collapse Prompt Tuning for Long-Tailed Generalization of Vision-Language Models

Boyang Guo, Liang Li, Lin Peng et al.

Prompt learning has emerged as an efficient alternative to fine-tuning pre-trained vision-language models (VLMs). Despite its promise, current methods still struggle to maintain tail-class discriminability when adapting to class-imbalanced datasets. In this work, we propose cluster-aware neural collapse prompt tuning (CPT), which enhances the discriminability of tail classes in prompt-tuned VLMs without sacrificing their overall generalization. First, we design a cluster-invariant space by mining semantic assignments from the pre-trained VLM and mapping them to prompt-tuned features. This computes cluster-level boundaries and restricts the constraints to local neighborhoods, which reduces interference with the global semantic structure of the pre-trained VLM. Second, we introduce neural-collapse-driven discriminability optimization with three losses: textual Equiangular Tight Frame (ETF) separation loss, class-wise convergence loss, and rotation stabilization loss. These losses work together to shape intra-cluster geometry for better inter-class separation and intra-class alignment. Extensive experiments on 11 diverse datasets demonstrate that CPT outperforms SOTA methods, with stronger performance on long-tail classes and good generalization to unseen classes.

AIJul 29, 2025Code
Progressive Homeostatic and Plastic Prompt Tuning for Audio-Visual Multi-Task Incremental Learning

Jiong Yin, Liang Li, Jiehua Zhang et al.

Audio-visual multi-task incremental learning aims to continuously learn from multiple audio-visual tasks without the need for joint training on all tasks. The challenge of the problem is how to preserve the old task knowledge while facilitating the learning of new task with previous experiences. To address these challenges, we introduce a three-stage Progressive Homeostatic and Plastic audio-visual prompt (PHP) method. In the shallow phase, we design the task-shared modality aggregating adapter to foster cross-task and cross-modal audio-visual representation learning to enhance shared understanding between tasks. In the middle phase, we propose the task-specific modality-shared dynamic generating adapter, which constructs prompts that are tailored to individual tasks while remaining general across modalities, which balances the models ability to retain knowledge against forgetting with its potential for versatile multi-task transferability. In the deep phase, we introduce the task-specific modality-independent prompts to further refine the understand ability by targeting individual information for each task and modality. By incorporating these three phases, PHP retains task-specific prompts while adapting shared parameters for new tasks to effectively balance knowledge sharing and specificity. Our method achieves SOTA performance in different orders of four tasks (AVE, AVVP, AVS and AVQA). Our code can be available at https://github.com/ENJOY-Yin-jiong/PHP.

CVMar 29
Amped: Adaptive Multi-stage Non-edge Pruning for Edge Detection

Yuhan Gao, Xinqing Li, Xin He et al.

Edge detection is a fundamental image analysis task that underpins numerous high-level vision applications. Recent advances in Transformer architectures have significantly improved edge quality by capturing long-range dependencies, but this often comes with computational overhead. Achieving higher pixel-level accuracy requires increased input resolution, further escalating computational cost and limiting practical deployment. Building on the strong representational capacity of recent Transformer-based edge detectors, we propose an Adaptive Multi-stage non-edge Pruning framework for Edge Detection(Amped). Amped identifies high-confidence non-edge tokens and removes them as early as possible to substantially reduce computation, thus retaining high accuracy while cutting GFLOPs and accelerating inference with minimal performance loss. Moreover, to mitigate the structural complexity of existing edge detection networks and facilitate their integration into real-world systems, we introduce a simple yet high-performance Transformer-based model, termed Streamline Edge Detector(SED). Applied to both existing detectors and our SED, the proposed pruning strategy provides a favorable balance between accuracy and efficiency-reducing GFLOPs by up to 40% with only a 0.4% drop in ODS F-measure. In addition, despite its simplicity, SED achieves a state-of-the-art ODS F-measure of 86.5%. The code will be released.

SIMay 3, 2021
Textual Analysis of Communications in COVID-19 Infected Community on Social Media

Yuhan Liu, Yuhan Gao, Zhifan Nan et al.

During the COVID-19 pandemic, people started to discuss about pandemic-related topics on social media. On subreddit \textit{r/COVID19positive}, a number of topics are discussed or being shared, including experience of those who got a positive test result, stories of those who presumably got infected, and questions asked regarding the pandemic and the disease. In this study, we try to understand, from a linguistic perspective, the nature of discussions on the subreddit. We found differences in linguistic characteristics (e.g. psychological, emotional and reasoning) across three different categories of topics. We also classified posts into the different categories using SOTA pre-trained language models. Such classification model can be used for pandemic-related research on social media.