Kuangpu Guo

LG
h-index13
7papers
22citations
Novelty60%
AI Score61

7 Papers

96.9LGApr 23Code
Understanding and Mitigating Spurious Signal Amplification in Test-Time Reinforcement Learning for Math Reasoning

Yongcan Yu, Lingxiao He, Jian Liang et al.

Test-time reinforcement learning (TTRL) always adapts models at inference time via pseudo-labeling, leaving it vulnerable to spurious optimization signals from label noise. Through an empirical study, we observe that responses with medium consistency form an ambiguity region and constitute the primary source of reward noise. Crucially, we find that such spurious signals can be even amplified through group-relative advantage estimation. Motivated by these findings, we propose a unified framework, Debiased and Denoised test-time Reinforcement Learning (DDRL), to mitigate spurious signals. Concretely, DDRL first applies a frequency-based sampling strategy to exclude ambiguous samples while maintaining a balanced set of positive and negative examples. It then adopts a debiased advantage estimation with fixed advantages, removing the bias introduced by group-relative policy optimization. Finally, DDRL incorporates a consensus-based off-policy refinement stage, which leverages the rejection-sampled dataset to enable efficient and stable model updates. Experiments on three large language models across multiple mathematical reasoning benchmarks demonstrate that DDRL consistently outperforms existing TTRL baselines. The code will soon be released at https://github.com/yuyongcan/DDRL.

LGDec 1, 2025Code
Stay Unique, Stay Efficient: Preserving Model Personality in Multi-Task Merging

Kuangpu Guo, Yuhe Ding, Jian Liang et al.

Model merging has emerged as a promising paradigm for enabling multi-task capabilities without additional training. However, existing methods often experience substantial performance degradation compared with individually fine-tuned models, even on similar tasks, underscoring the need to preserve task-specific information. This paper proposes Decomposition, Thresholding, and Scaling (DTS), an approximation-based personalized merging framework that preserves task-specific information with minimal storage overhead. DTS first applies singular value decomposition to the task-specific information and retains only a small subset of singular values and vectors. It then introduces a novel thresholding strategy that partitions singular vector elements into groups and assigns a scaling factor to each group. To enable generalization to unseen tasks, we further extend DTS with a variant that fuses task-specific information in a data-free manner based on the semantic similarity of task characteristics. Extensive experiments demonstrate that DTS consistently outperforms state-of-the-art baselines while requiring only 1\% additional storage per task. Furthermore, experiments on unseen tasks show that the DTS variant achieves significantly better generalization performance. Our code is available at https://github.com/krumpguo/DTS.

LGJan 4, 2024Code
Exploring Vacant Classes in Label-Skewed Federated Learning

Kuangpu Guo, Yuhe Ding, Jian Liang et al.

Label skews, characterized by disparities in local label distribution across clients, pose a significant challenge in federated learning. As minority classes suffer from worse accuracy due to overfitting on local imbalanced data, prior methods often incorporate class-balanced learning techniques during local training. Although these methods improve the mean accuracy across all classes, we observe that vacant classes-referring to categories absent from a client's data distribution-remain poorly recognized. Besides, there is still a gap in the accuracy of local models on minority classes compared to the global model. This paper introduces FedVLS, a novel approach to label-skewed federated learning that integrates both vacant-class distillation and logit suppression simultaneously. Specifically, vacant-class distillation leverages knowledge distillation during local training on each client to retain essential information related to vacant classes from the global model. Moreover, logit suppression directly penalizes network logits for non-label classes, effectively addressing misclassifications in minority classes that may be biased toward majority classes. Extensive experiments validate the efficacy of FedVLS, demonstrating superior performance compared to previous state-of-the-art (SOTA) methods across diverse datasets with varying degrees of label skews. Our code is available at https://github.com/krumpguo/FedVLS.

CVOct 11, 2025Code
Cooperative Pseudo Labeling for Unsupervised Federated Classification

Kuangpu Guo, Lijun Sheng, Yongcan Yu et al.

Unsupervised Federated Learning (UFL) aims to collaboratively train a global model across distributed clients without sharing data or accessing label information. Previous UFL works have predominantly focused on representation learning and clustering tasks. Recently, vision language models (e.g., CLIP) have gained significant attention for their powerful zero-shot prediction capabilities. Leveraging this advancement, classification problems that were previously infeasible under the UFL paradigm now present promising new opportunities, yet remain largely unexplored. In this paper, we extend UFL to the classification problem with CLIP for the first time and propose a novel method, \underline{\textbf{Fed}}erated \underline{\textbf{Co}}operative \underline{\textbf{P}}seudo \underline{\textbf{L}}abeling (\textbf{FedCoPL}). Specifically, clients estimate and upload their pseudo label distribution, and the server adjusts and redistributes them to avoid global imbalance among classes. Moreover, we introduce a partial prompt aggregation protocol for effective collaboration and personalization. In particular, visual prompts containing general image features are aggregated at the server, while text prompts encoding personalized knowledge are retained locally. Extensive experiments demonstrate the superior performance of our FedCoPL compared to baseline methods. Our code is available at \href{https://github.com/krumpguo/FedCoPL}{https://github.com/krumpguo/FedCoPL}.

LGSep 27, 2024
An Enhanced Federated Prototype Learning Method under Domain Shift

Liang Kuang, Kuangpu Guo, Jian Liang et al.

Federated Learning (FL) allows collaborative machine learning training without sharing private data. Numerous studies have shown that one significant factor affecting the performance of federated learning models is the heterogeneity of data across different clients, especially when the data is sampled from various domains. A recent paper introduces variance-aware dual-level prototype clustering and uses a novel $α$-sparsity prototype loss, which increases intra-class similarity and reduces inter-class similarity. To ensure that the features converge within specific clusters, we introduce an improved algorithm, Federated Prototype Learning with Convergent Clusters, abbreviated as FedPLCC. To increase inter-class distances, we weight each prototype with the size of the cluster it represents. To reduce intra-class distances, considering that prototypes with larger distances might come from different domains, we select only a certain proportion of prototypes for the loss function calculation. Evaluations on the Digit-5, Office-10, and DomainNet datasets show that our method performs better than existing approaches.

LGJun 26, 2025
Personalized Federated Learning via Dual-Prompt Optimization and Cross Fusion

Yuguang Zhang, Kuangpu Guo, Zhihe Lu et al.

Federated learning (FL) enables collaborative model training across decentralized clients without sharing local data, but is challenged by heterogeneity in data, computation, and communication. Pretrained vision-language models (VLMs), with their strong generalization and lightweight tuning via prompts, offer a promising solution. However, existing federated prompt-learning methods rely only on text prompts and overlook joint label-domain distribution shifts. In this paper, we propose a personalized FL framework based on dual-prompt learning and cross fusion, termed pFedDC. Specifically, each client maintains both global and local prompts across vision and language modalities: global prompts capture common knowledge shared across the federation, while local prompts encode client-specific semantics and domain characteristics. Meanwhile, a cross-fusion module is designed to adaptively integrate prompts from different levels, enabling the model to generate personalized representations aligned with each client's unique data distribution. Extensive experiments across nine datasets with various types of heterogeneity show that pFedDC consistently outperforms state-of-the-art methods.

LGDec 14, 2025
Reassessing the Role of Supervised Fine-Tuning: An Empirical Study in VLM Reasoning

Yongcan Yu, Lingxiao He, Shuo Lu et al.

Recent advances in vision-language models (VLMs) reasoning have been largely attributed to the rise of reinforcement Learning (RL), which has shifted the community's focus away from the supervised fine-tuning (SFT) paradigm. Many studies suggest that introducing the SFT stage not only fails to improve reasoning ability but may also negatively impact model training. In this study, we revisit this RL-centric belief through a systematic and controlled comparison of SFT and RL on VLM Reasoning. Using identical data sources, we find that the relative effectiveness of SFT and RL is conditional and strongly influenced by model capacity, data scale, and data distribution. Contrary to common assumptions, our findings show that SFT plays a crucial role across several scenarios: (1) Effectiveness for weaker models. SFT more reliably elicits reasoning capabilities in smaller or weaker VLMs. (2) Data efficiency. SFT with only 2K achieves comparable or better reasoning performance to RL with 20K. (3) Cross-modal transferability. SFT demonstrates stronger generalization across modalities. Moreover, we identify a pervasive issue of deceptive rewards, where higher rewards fail to correlate with better reasoning accuracy in RL. These results challenge the prevailing "RL over SFT" narrative. They highlight that the role of SFT may have been underestimated and support a more balanced post-training pipeline in which SFT and RL function as complementary components.