LGJul 17, 2023
FedCME: Client Matching and Classifier Exchanging to Handle Data Heterogeneity in Federated LearningJun Nie, Danyang Xiao, Lei Yang et al.
Data heterogeneity across clients is one of the key challenges in Federated Learning (FL), which may slow down the global model convergence and even weaken global model performance. Most existing approaches tackle the heterogeneity by constraining local model updates through reference to global information provided by the server. This can alleviate the performance degradation on the aggregated global model. Different from existing methods, we focus the information exchange between clients, which could also enhance the effectiveness of local training and lead to generate a high-performance global model. Concretely, we propose a novel FL framework named FedCME by client matching and classifier exchanging. In FedCME, clients with large differences in data distribution will be matched in pairs, and then the corresponding pair of clients will exchange their classifiers at the stage of local training in an intermediate moment. Since the local data determines the local model training direction, our method can correct update direction of classifiers and effectively alleviate local update divergence. Besides, we propose feature alignment to enhance the training of the feature extractor. Experimental results demonstrate that FedCME performs better than FedAvg, FedProx, MOON and FedRS on popular federated learning benchmarks including FMNIST and CIFAR10, in the case where data are heterogeneous.
CLJul 29, 2024
Cool-Fusion: Fuse Large Language Models without TrainingCong Liu, Xiaojun Quan, Yan Pan et al.
We focus on the problem of fusing two or more heterogeneous large language models (LLMs) to leverage their complementary strengths. One of the challenges of model fusion is high computational load, specifically in fine-tuning or aligning vocabularies. To address this, we propose Cool-Fusion, a simple yet effective approach that fuses the knowledge of source LLMs, which does not require training. Unlike ensemble methods, Cool-Fusion is applicable to any set of source LLMs that have different vocabularies. To overcome the vocabulary discrepancies among LLMs, we ensemble LLMs on text level, allowing them to rerank the generated texts by each other with different granularities. Extensive experiments have been conducted across a variety of benchmark datasets. On GSM8K, Cool-Fusion increases accuracy from three strong source LLMs by a significant margin of 17.4\%.
SEJun 11, 2024Code
VersiCode: Towards Version-controllable Code GenerationTongtong Wu, Weigang Wu, Xingyu Wang et al.
Large Language Models (LLMs) have made tremendous strides in code generation, but existing research fails to account for the dynamic nature of software development, marked by frequent library updates. This gap significantly limits LLMs' deployment in realistic settings. In this paper, we propose two novel tasks aimed at bridging this gap: version-specific code completion (VSCC) and version-aware code migration (VACM). In conjunction, we introduce VersiCode, a comprehensive Python dataset specifically designed to evaluate LLMs on these two tasks, together with a novel evaluation metric, Critical Diff Check (CDC@1), which assesses code generation against evolving API requirements. We conduct an extensive evaluation on VersiCode, which reveals that version-controllable code generation is indeed a significant challenge, even for GPT-4o and other strong frontier models. We believe the novel tasks, dataset, and metric open up a new, important research direction that will further enhance LLMs' real-world applicability. The code and resources can be found at https://github.com/wutong8023/VersiCode.
CLJun 8, 2025
Chain of Methodologies: Scaling Test Time Computation without TrainingCong Liu, Jie Wu, Weigang Wu et al.
Large Language Models (LLMs) often struggle with complex reasoning tasks due to insufficient in-depth insights in their training data, which are typically absent in publicly available documents. This paper introduces the Chain of Methodologies (CoM), an innovative and intuitive prompting framework that enhances structured thinking by integrating human methodological insights, enabling LLMs to tackle complex tasks with extended reasoning. CoM leverages the metacognitive abilities of advanced LLMs, activating systematic reasoning throught user-defined methodologies without explicit fine-tuning. Experiments show that CoM surpasses competitive baselines, demonstrating the potential of training-free prompting methods as robust solutions for complex reasoning tasks and bridging the gap toward human-level reasoning through human-like methodological insights.
LGApr 28, 2025
Convergence Analysis of Asynchronous Federated Learning with Gradient Compression for Non-Convex OptimizationDiying Yang, Yingwei Hou, Weigang Wu
In practical federated learning (FL), the large communication overhead between clients and the server is often a significant bottleneck. Gradient compression methods can effectively reduce this overhead, while error feedback (EF) restores model accuracy. Moreover, due to device heterogeneity, synchronous FL often suffers from stragglers and inefficiency-issues that asynchronous FL effectively alleviates. However, in asynchronous FL settings-which inherently face three major challenges: asynchronous delay, data heterogeneity, and flexible client participation-the complex interactions among these system/statistical constraints and compression/EF mechanisms remain poorly understood theoretically. In this paper, we fill this gap through a comprehensive convergence study that adequately decouples and unravels these complex interactions across various FL frameworks. We first consider a basic asynchronous FL framework AsynFL, and establish an improved convergence analysis that relies on fewer assumptions and yields a superior convergence rate than prior studies. We then extend our study to a compressed version, AsynFLC, and derive sufficient conditions for its convergence, indicating the nonlinear interaction between asynchronous delay and compression rate. Our analysis further demonstrates how asynchronous delay and data heterogeneity jointly exacerbate compression-induced errors, thereby hindering convergence. Furthermore, we study the convergence of AsynFLC-EF, the framework that further integrates EF. We prove that EF can effectively reduce the variance of gradient estimation under the aforementioned challenges, enabling AsynFLC-EF to match the convergence rate of AsynFL. We also show that the impact of asynchronous delay and flexible participation on EF is limited to slowing down the higher-order convergence term. Experimental results substantiate our analytical findings very well.
LGDec 31, 2020
PFL-MoE: Personalized Federated Learning Based on Mixture of ExpertsBinbin Guo, Yuan Mei, Danyang Xiao et al.
Federated learning (FL) is an emerging distributed machine learning paradigm that avoids data sharing among training nodes so as to protect data privacy. Under coordination of the FL server, each client conducts model training using its own computing resource and private data set. The global model can be created by aggregating the training results of clients. To cope with highly non-IID data distributions, personalized federated learning (PFL) has been proposed to improve overall performance by allowing each client to learn a personalized model. However, one major drawback of a personalized model is the loss of generalization. To achieve model personalization while maintaining generalization, in this paper, we propose a new approach, named PFL-MoE, which mixes outputs of the personalized model and global model via the MoE architecture. PFL-MoE is a generic approach and can be instantiated by integrating existing PFL algorithms. Particularly, we propose the PFL-MF algorithm which is an instance of PFL-MoE based on the freeze-base PFL algorithm. We further improve PFL-MF by enhancing the decision-making ability of MoE gating network and propose a variant algorithm PFL-MFE. We demonstrate the effectiveness of PFL-MoE by training the LeNet-5 and VGG-16 models on the Fashion-MNIST and CIFAR-10 datasets with non-IID partitions.
CROct 15, 2020
Garou: An Efficient and Secure Off-Blockchain Multi-Party Payment HubYongjie Ye, Weigang Wu
To mitigate the scalability problem of decentralized cryptocurrencies such as Bitcoin and Ethereum, the payment channel, which allows two parties to perform secure coin transfers without involving the blockchain, has been proposed. The payment channel increases the transaction throughput of two parties to a level that is only limited by their network bandwidth. Recent proposals focus on extending the two-party payment channel to the N-party payment hub. Unfortunately, none of them can achieve efficiency, flexibility in the absence of a trusted third-party. In this paper, we propose Garou, a secure N-party payment hub that allows multiple parties to perform secure off-chain coin transfers. Except in the case of disputes, participants within the payment hub can make concurrent and direct coin transfers with each other without the involvement of the blockchain or any third-party intermediaries. This allows Garou to achieve both high-performance and flexibility. Garou also guarantees that an honest party always maintains its balance security against strong adversarial capabilities. To demonstrate the feasibility of the Garou protocol, we develop a proof of concept prototype for the Ethereum network. Our evaluation results show that the maximum transaction throughput of Garou is 20 times higher than that of state-of-art payment hubs.
CRNov 29, 2019
Boros: Secure Cross-Channel Transfers via Channel HubYongJie Ye, Jingjing Zhang, Weigang Wu et al.
The payment channel, which allows two parties to perform micropayments without involving the blockchain, has become a promising proposal to improve the scalability of decentralized ledgers such as Bitcoin and Ethereum. Payment channels have been extended to the payment network, through which users can utilize existing channels as intermediary links to route coins to others. However, routing payments through multiple channels bears nontrivial overheads. It requires every intermediary channel to lock a portion of its available capacity until the payment is settled. This may lead to deadlock in a concurrent situation. The intermediary nodes in a payment path may also charge fees for routing a payment. The longer the routing path, the more serious the above problems. In this paper, we design and develop a novel off-chain system to shorten the routing path for the payment network. In particular, we propose the channel hub, which is an extension of the payment hub, to allows transferring coins directly from one payment channel to another within the same hub. That is, the channel hub can be viewed as a shortcut device for the underlying payment network. We design a new protocol named Boros to perform secure off-chain cross-channel transfers through the channel hub. We not only present the security definition of the Boros protocol formally but also prove its security using the UC-framework. To demonstrate the feasibility of the Boros protocol, we develop a proof-of-concept prototype running on the Ethereum. Our evaluation shows that our system can effectively shorten the off-chain routing path.