DCJun 18, 2022Code
Pisces: Efficient Federated Learning via Guided Asynchronous TrainingZhifeng Jiang, Wei Wang, Baochun Li et al.
Federated learning (FL) is typically performed in a synchronous parallel manner, where the involvement of a slow client delays a training iteration. Current FL systems employ a participant selection strategy to select fast clients with quality data in each iteration. However, this is not always possible in practice, and the selection strategy often has to navigate an unpleasant trade-off between the speed and the data quality of clients. In this paper, we present Pisces, an asynchronous FL system with intelligent participant selection and model aggregation for accelerated training. To avoid incurring excessive resource cost and stale training computation, Pisces uses a novel scoring mechanism to identify suitable clients to participate in a training iteration. It also adapts the pace of model aggregation to dynamically bound the progress gap between the selected clients and the server, with a provable convergence guarantee in a smooth non-convex setting. We have implemented Pisces in an open-source FL platform called Plato, and evaluated its performance in large-scale experiments with popular vision and language models. Pisces outperforms the state-of-the-art synchronous and asynchronous schemes, accelerating the time-to-accuracy by up to 2.0x and 1.9x, respectively.
LGOct 13, 2022
Feature Reconstruction Attacks and Countermeasures of DNN training in Vertical Federated LearningPeng Ye, Zhifeng Jiang, Wei Wang et al.
Federated learning (FL) has increasingly been deployed, in its vertical form, among organizations to facilitate secure collaborative training over siloed data. In vertical FL (VFL), participants hold disjoint features of the same set of sample instances. Among them, only one has labels. This participant, known as the active party, initiates the training and interacts with the other participants, known as the passive parties. Despite the increasing adoption of VFL, it remains largely unknown if and how the active party can extract feature data from the passive party, especially when training deep neural network (DNN) models. This paper makes the first attempt to study the feature security problem of DNN training in VFL. We consider a DNN model partitioned between active and passive parties, where the latter only holds a subset of the input layer and exhibits some categorical features of binary values. Using a reduction from the Exact Cover problem, we prove that reconstructing those binary features is NP-hard. Through analysis, we demonstrate that, unless the feature dimension is exceedingly large, it remains feasible, both theoretically and practically, to launch a reconstruction attack with an efficient search-based algorithm that prevails over current feature protection techniques. To address this problem, we develop a novel feature protection scheme against the reconstruction attack that effectively misleads the search to some pre-specified random values. With an extensive set of experiments, we show that our protection scheme sustains the feature reconstruction attack in various VFL applications at no expense of accuracy loss.
CVAug 18, 2023Code
Language-Guided Diffusion Model for Visual GroundingSijia Chen, Baochun Li
Visual grounding (VG) tasks involve explicit cross-modal alignment, as semantically corresponding image regions are to be located for the language phrases provided. Existing approaches complete such visual-text reasoning in a single-step manner. Their performance causes high demands on large-scale anchors and over-designed multi-modal fusion modules based on human priors, leading to complicated frameworks that may be difficult to train and overfit to specific scenarios. Even worse, such once-for-all reasoning mechanisms are incapable of refining boxes continuously to enhance query-region matching. In contrast, in this paper, we formulate an iterative reasoning process by denoising diffusion modeling. Specifically, we propose a language-guided diffusion framework for visual grounding, LG-DVG, which trains the model to progressively reason queried object boxes by denoising a set of noisy boxes with the language guide. To achieve this, LG-DVG gradually perturbs query-aligned ground truth boxes to noisy ones and reverses this process step by step, conditional on query semantics. Extensive experiments for our proposed framework on five widely used datasets validate the superior performance of solving visual grounding, a cross-modal alignment task, in a generative way. The source codes are available at https://github.com/iQua/vgbase/tree/main/examples/DiffusionVG.
LGMar 29, 2022
OrphicX: A Causality-Inspired Latent Variable Model for Interpreting Graph Neural NetworksWanyu Lin, Hao Lan, Hao Wang et al.
This paper proposes a new eXplanation framework, called OrphicX, for generating causal explanations for any graph neural networks (GNNs) based on learned latent causal factors. Specifically, we construct a distinct generative model and design an objective function that encourages the generative model to produce causal, compact, and faithful explanations. This is achieved by isolating the causal factors in the latent space of graphs by maximizing the information flow measurements. We theoretically analyze the cause-effect relationships in the proposed causal graph, identify node attributes as confounders between graphs and GNN predictions, and circumvent such confounder effect by leveraging the backdoor adjustment formula. Our framework is compatible with any GNNs, and it does not require access to the process by which the target GNN produces its predictions. In addition, it does not rely on the linear-independence assumption of the explained features, nor require prior knowledge on the graph learning tasks. We show a proof-of-concept of OrphicX on canonical classification problems on graph data. In particular, we analyze the explanatory subgraphs obtained from explanations for molecular graphs (i.e., Mutag) and quantitatively evaluate the explanation performance with frequently occurring subgraph patterns. Empirically, we show that OrphicX can effectively identify the causal semantics for generating causal explanations, significantly outperforming its alternatives.
74.3SEMay 7
TSGuard: Automated User-Centric Incident Diagnosis for AI Workloads in the CloudYitao Yang, Yangtao Deng, Yifan Xiong et al.
AI workloads incur frequent failures and incidents from the underlying infrastructure. The current incident management workflow follows a provider-centric paradigm, where users report incidents to the infrastructure provider who then conducts troubleshooting. Due to the large number of incidents and the manual nature of the troubleshooting process, the provider often takes several days to resolve an incident, resulting in operational delays and productivity loss. To address these challenges, we present TSGuard, a user-centric multi-agent system that delivers immediate incident diagnosis to users who deploy the workloads. The core innovation of TSGuard is twofold: (1) constructing domain-specific knowledge bases by mining historical on-call experiences in the offline phase, and (2) mimicking human expert diagnosis via structured reasoning and iterative trial-and-error in the online phase. Evaluation using production incident records from Microsoft Azure demonstrates that TSGuard significantly outperforms state-of-the-art baselines, improving diagnostic accuracy by 19.8%. Furthermore, TSGuard reduces the average verification time by 63.4% compared to the sequential execution baseline.
LGFeb 2
Dissecting Outlier Dynamics in LLM NVFP4 PretrainingPeijie Dong, Ruibo Fan, Yuechen Tao et al.
Training large language models using 4-bit arithmetic enhances throughput and memory efficiency. Yet, the limited dynamic range of FP4 increases sensitivity to outliers. While NVFP4 mitigates quantization error via hierarchical microscaling, a persistent loss gap remains compared to BF16. This study conducts a longitudinal analysis of outlier dynamics across architecture during NVFP4 pretraining, focusing on where they localize, why they occur, and how they evolve temporally. We find that, compared with Softmax Attention (SA), Linear Attention (LA) reduces per-tensor heavy tails but still exhibits persistent block-level spikes under block quantization. Our analysis attributes outliers to specific architectural components: Softmax in SA, gating in LA, and SwiGLU in FFN, with "post-QK" operations exhibiting higher sensitivity to quantization. Notably, outliers evolve from transient spikes early in training to a small set of persistent hot channels (i.e., channels with persistently large magnitudes) in later stages. Based on these findings, we introduce Hot-Channel Patch (HCP), an online compensation mechanism that identifies hot channels and reinjects residuals using hardware-efficient kernels. We then develop CHON, an NVFP4 training recipe integrating HCP with post-QK operation protection. On GLA-1.3B model trained for 60B tokens, CHON reduces the loss gap to BF16 from 0.94% to 0.58% while maintaining downstream accuracy.
LGDec 28, 2024Code
Calibre: Towards Fair and Accurate Personalized Federated Learning with Self-Supervised LearningSijia Chen, Ningxin Su, Baochun Li
In the context of personalized federated learning, existing approaches train a global model to extract transferable representations, based on which any client could train personalized models with a limited number of data samples. Self-supervised learning is considered a promising direction as the global model it produces is generic and facilitates personalization for all clients fairly. However, when data is heterogeneous across clients, the global model trained using SSL is unable to learn high-quality personalized models. In this paper, we show that when the global model is trained with SSL without modifications, its produced representations have fuzzy class boundaries. As a result, personalized learning within each client produces models with low accuracy. In order to improve SSL towards better accuracy without sacrificing its advantage in fairness, we propose Calibre, a new personalized federated learning framework designed to calibrate SSL representations by maintaining a suitable balance between more generic and more client-specific representations. Calibre is designed based on theoretically-sound properties, and introduces (1) a client-specific prototype loss as an auxiliary training objective; and (2) an aggregation algorithm guided by such prototypes across clients. Our experimental results in an extensive array of non-i.i.d.~settings show that Calibre achieves state-of-the-art performance in terms of both mean accuracy and fairness across clients. Code repo: https://github.com/TL-System/plato/tree/main/examples/ssl/calibre.
DCDec 21, 2025
Remoe: Towards Efficient and Low-Cost MoE Inference in Serverless ComputingWentao Liu, Yuhao Hu, Ruiting Zhou et al.
Mixture-of-Experts (MoE) has become a dominant architecture in large language models (LLMs) due to its ability to scale model capacity via sparse expert activation. Meanwhile, serverless computing, with its elasticity and pay-per-use billing, is well-suited for deploying MoEs with bursty workloads. However, the large number of experts in MoE models incurs high inference costs due to memory-intensive parameter caching. These costs are difficult to mitigate via simple model partitioning due to input-dependent expert activation. To address these issues, we propose Remoe, a heterogeneous MoE inference system tailored for serverless computing. Remoe assigns non-expert modules to GPUs and expert modules to CPUs, and further offloads infrequently activated experts to separate serverless functions to reduce memory overhead and enable parallel execution. We incorporate three key techniques: (1) a Similar Prompts Searching (SPS) algorithm to predict expert activation patterns based on semantic similarity of inputs; (2) a Main Model Pre-allocation (MMP) algorithm to ensure service-level objectives (SLOs) via worst-case memory estimation; and (3) a joint memory and replica optimization framework leveraging Lagrangian duality and the Longest Processing Time (LPT) algorithm. We implement Remoe on Kubernetes and evaluate it across multiple LLM benchmarks. Experimental results show that Remoe reduces inference cost by up to 57% and cold start latency by 47% compared to state-of-the-art baselines.
AIDec 9, 2025
rSIM: Incentivizing Reasoning Capabilities of LLMs via Reinforced Strategy InjectionSijia Chen, Baochun Li, Di Niu
Large language models (LLMs) are post-trained through reinforcement learning (RL) to evolve into Reasoning Language Models (RLMs), where the hallmark of this advanced reasoning is ``aha'' moments when they start to perform strategies, such as self-reflection and deep thinking, within chain of thoughts (CoTs). Motivated by this, this paper proposes a novel reinforced strategy injection mechanism (rSIM), that enables any LLM to become an RLM by employing a small planner to guide the LLM's CoT through the adaptive injection of reasoning strategies. To achieve this, the planner (leader agent) is jointly trained with an LLM (follower agent) using multi-agent RL (MARL), based on a leader-follower framework and straightforward rule-based rewards. Experimental results show that rSIM enables Qwen2.5-0.5B to become an RLM and significantly outperform Qwen2.5-14B. Moreover, the planner is generalizable: it only needs to be trained once and can be applied as a plug-in to substantially improve the reasoning capabilities of existing LLMs. In addition, the planner supports continual learning across various tasks, allowing its planning abilities to gradually improve and generalize to a wider range of problems.
CLFeb 17, 2024
Boosting of Thoughts: Trial-and-Error Problem Solving with Large Language ModelsSijia Chen, Baochun Li, Di Niu
The reasoning performance of Large Language Models (LLMs) on a wide range of problems critically relies on chain-of-thought prompting, which involves providing a few chain of thought demonstrations as exemplars in prompts. Recent work, e.g., Tree of Thoughts, has pointed out the importance of exploration and self-evaluation in reasoning step selection for complex problem solving. In this paper, we present Boosting of Thoughts (BoT), an automated prompting framework for problem solving with LLMs by iteratively exploring and self-evaluating many trees of thoughts in order to acquire an ensemble of trial-and-error reasoning experiences, which will serve as a new form of prompting to solve the complex problem. Starting from a simple prompt without requiring examples, BoT iteratively explores and evaluates a large collection of reasoning steps, and more importantly, uses error analysis obtained from the LLM on them to explicitly revise prompting, which in turn enhances reasoning step generation, until a final answer is attained. Our experiments with GPT-4 and Llama2 across extensive complex mathematical problems demonstrate that BoT consistently achieves higher or comparable problem-solving rates than other advanced prompting approaches.
AIDec 27, 2024
Toward Adaptive Reasoning in Large Language Models with Thought RollbackSijia Chen, Baochun Li
Large language models (LLMs) have been routinely used to solve various tasks using step-by-step reasoning. However, the structure of intermediate reasoning steps, or thoughts, is rigid and unidirectional, such as chains, trees, or acyclic-directed graphs. Consequently, the resulting inflexible and forward-only reasoning may not address challenging tasks and fail when the LLM frequently gives false responses, i.e., ``hallucinations''. This paper proposes a new reasoning framework, called Thought Rollback (TR), allowing LLMs to adaptively build thought structure while maintaining effective reasoning toward problem-solving under ``hallucinations''. The core mechanism of TR is rolling back thoughts, which allows LLMs to perform error analysis on thoughts, and thus roll back to any previously mistaken thought for revision. Subsequently, by including such trial-and-error in the prompt to guide the LLM, each rollback leads to one more reliable reasoning path. Therefore, starting with a simple prompt without human annotations, LLM with TR adaptively and gradually explores thoughts for a correct solution. Comprehensive experiments on mathematical problems and multi-task reasoning demonstrate the state-of-the-art performance of TR in terms of problem-solving rate and interaction cost. For instance, the solving rate of GPT-4 with TR outperforms the current best by $9\%$ on the MATH dataset.
CLMay 23, 2025
Guided by Gut: Efficient Test-Time Scaling with Reinforced Intrinsic ConfidenceAmirhosein Ghasemabadi, Keith G. Mills, Baochun Li et al.
Test-Time Scaling (TTS) methods for enhancing Large Language Model (LLM) reasoning often incur substantial computational costs, primarily due to extensive reliance on external Process Reward Models (PRMs) or sampling methods like Best-of-N (BoN). This paper introduces Guided by Gut (GG), an efficient self-guided TTS framework that achieves PRM-level performance without costly external verifier models. Our method employs a lightweight tree search guided solely by intrinsic LLM signals, token-level confidence and step novelty. One critical innovation is improving the reliability of internal confidence estimates via a targeted reinforcement learning fine-tuning phase. Empirical evaluations on challenging mathematical reasoning benchmarks demonstrate that GG enables smaller models (e.g., 1.5B parameters) to achieve accuracy matching or surpassing significantly larger models (e.g., 32B-70B parameters), while reducing GPU memory usage by up to 10x. Compared to PRM-based methods, GG achieves comparable accuracy with 8x faster inference speeds and 4-5x lower memory usage. Additionally, GG reduces KV cache memory usage by approximately 50% compared to the BoN strategy, facilitating more efficient and practical deployment of TTS techniques.
CRMar 9, 2024
TokenMark: A Modality-Agnostic Watermark for Pre-trained TransformersHengyuan Xu, Liyao Xiang, Borui Yang et al.
Watermarking is a critical tool for model ownership verification. However, existing watermarking techniques are often designed for specific data modalities and downstream tasks, without considering the inherent architectural properties of the model. This lack of generality and robustness underscores the need for a more versatile watermarking approach. In this work, we investigate the properties of Transformer models and propose TokenMark, a modality-agnostic, robust watermarking system for pre-trained models, leveraging the permutation equivariance property. TokenMark embeds the watermark by fine-tuning the pre-trained model on a set of specifically permuted data samples, resulting in a watermarked model that contains two distinct sets of weights -- one for normal functionality and the other for watermark extraction, the latter triggered only by permuted inputs. Extensive experiments on state-of-the-art pre-trained models demonstrate that TokenMark significantly improves the robustness, efficiency, and universality of model watermarking, highlighting its potential as a unified watermarking solution.
LGFeb 26, 2024
FedReview: A Review Mechanism for Rejecting Poisoned Updates in Federated LearningTianhang Zheng, Baochun Li
Federated learning has recently emerged as a decentralized approach to learn a high-performance model without access to user data. Despite its effectiveness, federated learning gives malicious users opportunities to manipulate the model by uploading poisoned model updates to the server. In this paper, we propose a review mechanism called FedReview to identify and decline the potential poisoned updates in federated learning. Under our mechanism, the server randomly assigns a subset of clients as reviewers to evaluate the model updates on their training datasets in each round. The reviewers rank the model updates based on the evaluation results and count the number of the updates with relatively low quality as the estimated number of poisoned updates. Based on review reports, the server employs a majority voting mechanism to integrate the rankings and remove the potential poisoned updates in the model aggregation process. Extensive evaluation on multiple datasets demonstrate that FedReview can assist the server to learn a well-performed global model in an adversarial environment.
LGJun 25, 2025
Hear No Evil: Detecting Gradient Leakage by Malicious Servers in Federated LearningFei Wang, Baochun Li
Recent work has shown that gradient updates in federated learning (FL) can unintentionally reveal sensitive information about a client's local data. This risk becomes significantly greater when a malicious server manipulates the global model to provoke information-rich updates from clients. In this paper, we adopt a defender's perspective to provide the first comprehensive analysis of malicious gradient leakage attacks and the model manipulation techniques that enable them. Our investigation reveals a core trade-off: these attacks cannot be both highly effective in reconstructing private data and sufficiently stealthy to evade detection -- especially in realistic FL settings that incorporate common normalization techniques and federated averaging. Building on this insight, we argue that malicious gradient leakage attacks, while theoretically concerning, are inherently limited in practice and often detectable through basic monitoring. As a complementary contribution, we propose a simple, lightweight, and broadly applicable client-side detection mechanism that flags suspicious model updates before local training begins, despite the fact that such detection may not be strictly necessary in realistic FL settings. This mechanism further underscores the feasibility of defending against these attacks with minimal overhead, offering a deployable safeguard for privacy-conscious federated learning systems.
LGJun 25, 2025
Leaner Training, Lower Leakage: Revisiting Memorization in LLM Fine-Tuning with LoRAFei Wang, Baochun Li
Memorization in large language models (LLMs) makes them vulnerable to data extraction attacks. While pre-training memorization has been extensively studied, fewer works have explored its impact in fine-tuning, particularly for LoRA fine-tuning, a widely adopted parameter-efficient method. In this work, we re-examine memorization in fine-tuning and uncover a surprising divergence from prior findings across different fine-tuning strategies. Factors such as model scale and data duplication, which strongly influence memorization in pre-training and full fine-tuning, do not follow the same trend in LoRA fine-tuning. Using a more relaxed similarity-based memorization metric, we demonstrate that LoRA significantly reduces memorization risks compared to full fine-tuning, while still maintaining strong task performance.
LGMar 12, 2025
TreeX: Generating Global Graphical GNN Explanations via Critical Subtree ExtractionShengyao Lu, Jiuding Yang, Baochun Li et al.
The growing demand for transparency and interpretability in critical domains has driven increased interests in comprehending the explainability of Message-Passing (MP) Graph Neural Networks (GNNs). Although substantial research efforts have been made to generate explanations for individual graph instances, identifying global explaining concepts for a GNN still poses great challenges, especially when concepts are desired in a graphical form on the dataset level. While most prior works treat GNNs as black boxes, in this paper, we propose to unbox GNNs by analyzing and extracting critical subtrees incurred by the inner workings of message passing, which correspond to critical subgraphs in the datasets. By aggregating subtrees in an embedding space with an efficient algorithm, which does not require complex subgraph matching or search, we can make intuitive graphical explanations for Message-Passing GNNs on local, class and global levels. We empirically show that our proposed approach not only generates clean subgraph concepts on a dataset level in contrast to existing global explaining methods which generate non-graphical rules (e.g., language or embeddings) as explanations, but it is also capable of providing explanations for individual instances with a comparable or even superior performance as compared to leading local-level GNN explainers.
LGJan 23, 2022
Towards Private Learning on Decentralized Graphs with Local Differential PrivacyWanyu Lin, Baochun Li, Cong Wang
Many real-world networks are inherently decentralized. For example, in social networks, each user maintains a local view of a social graph, such as a list of friends and her profile. It is typical to collect these local views of social graphs and conduct graph learning tasks. However, learning over graphs can raise privacy concerns as these local views often contain sensitive information. In this paper, we seek to ensure private graph learning on a decentralized network graph. Towards this objective, we propose {\em Solitude}, a new privacy-preserving learning framework based on graph neural networks (GNNs), with formal privacy guarantees based on edge local differential privacy. The crux of {\em Solitude} is a set of new delicate mechanisms that can calibrate the introduced noise in the decentralized graph collected from the users. The principle behind the calibration is the intrinsic properties shared by many real-world graphs, such as sparsity. Unlike existing work on locally private GNNs, our new framework can simultaneously protect node feature privacy and edge privacy, and can seamlessly incorporate with any GNN with privacy-utility guarantees. Extensive experiments on benchmarking datasets show that {\em Solitude} can retain the generalization capability of the learned GNN while preserving the users' data privacy under given privacy budgets.
LGApr 14, 2021
Generative Causal Explanations for Graph Neural NetworksWanyu Lin, Hao Lan, Baochun Li
This paper presents Gem, a model-agnostic approach for providing interpretable explanations for any GNNs on various graph learning tasks. Specifically, we formulate the problem of providing explanations for the decisions of GNNs as a causal learning task. Then we train a causal explanation model equipped with a loss function based on Granger causality. Different from existing explainers for GNNs, Gem explains GNNs on graph-structured data from a causal perspective. It has better generalization ability as it has no requirements on the internal structure of the GNNs or prior knowledge on the graph learning tasks. In addition, Gem, once trained, can be used to explain the target GNN very quickly. Our theoretical analysis shows that several recent explainers fall into a unified framework of additive feature attribution methods. Experimental results on synthetic and real-world datasets show that Gem achieves a relative increase of the explanation accuracy by up to $30\%$ and speeds up the explanation process by up to $110\times$ as compared to its state-of-the-art alternatives.
LGMay 15, 2020
Towards Assessment of Randomized Smoothing Mechanisms for Certifying Adversarial RobustnessTianhang Zheng, Di Wang, Baochun Li et al.
As a certified defensive technique, randomized smoothing has received considerable attention due to its scalability to large datasets and neural networks. However, several important questions remain unanswered, such as (i) whether the Gaussian mechanism is an appropriate option for certifying $\ell_2$-norm robustness, and (ii) whether there is an appropriate randomized (smoothing) mechanism to certify $\ell_\infty$-norm robustness. To shed light on these questions, we argue that the main difficulty is how to assess the appropriateness of each randomized mechanism. In this paper, we propose a generic framework that connects the existing frameworks in \cite{lecuyer2018certified, li2019certified}, to assess randomized mechanisms. Under our framework, for a randomized mechanism that can certify a certain extent of robustness, we define the magnitude of its required additive noise as the metric for assessing its appropriateness. We also prove lower bounds on this metric for the $\ell_2$-norm and $\ell_\infty$-norm cases as the criteria for assessment. Based on our framework, we assess the Gaussian and Exponential mechanisms by comparing the magnitude of additive noise required by these mechanisms and the lower bounds (criteria). We first conclude that the Gaussian mechanism is indeed an appropriate option to certify $\ell_2$-norm robustness. Surprisingly, we show that the Gaussian mechanism is also an appropriate option for certifying $\ell_\infty$-norm robustness, instead of the Exponential mechanism. Finally, we generalize our framework to $\ell_p$-norm for any $p\geq2$. Our theoretical findings are verified by evaluations on CIFAR10 and ImageNet.
CVMay 14, 2020
Towards Understanding the Adversarial Vulnerability of Skeleton-based Action RecognitionTianhang Zheng, Sheng Liu, Changyou Chen et al.
Skeleton-based action recognition has attracted increasing attention due to its strong adaptability to dynamic circumstances and potential for broad applications such as autonomous and anonymous surveillance. With the help of deep learning techniques, it has also witnessed substantial progress and currently achieved around 90\% accuracy in benign environment. On the other hand, research on the vulnerability of skeleton-based action recognition under different adversarial settings remains scant, which may raise security concerns about deploying such techniques into real-world systems. However, filling this research gap is challenging due to the unique physical constraints of skeletons and human actions. In this paper, we attempt to conduct a thorough study towards understanding the adversarial vulnerability of skeleton-based action recognition. We first formulate generation of adversarial skeleton actions as a constrained optimization problem by representing or approximating the physiological and physical constraints with mathematical formulations. Since the primal optimization problem with equality constraints is intractable, we propose to solve it by optimizing its unconstrained dual problem using ADMM. We then specify an efficient plug-in defense, inspired by recent theories and empirical observations, against the adversarial skeleton actions. Extensive evaluations demonstrate the effectiveness of the attack and defense method under different settings.
CVOct 28, 2019
Shoestring: Graph-Based Semi-Supervised Learning with Severely Limited Labeled DataWanyu Lin, Zhaolin Gao, Baochun Li
Graph-based semi-supervised learning has been shown to be one of the most effective approaches for classification tasks from a wide range of domains, such as image classification and text classification, as they can exploit the connectivity patterns between labeled and unlabeled samples to improve learning performance. In this work, we advance this effective learning paradigm towards a scenario where labeled data are severely limited. More specifically, we address the problem of graph-based semi-supervised learning in the presence of severely limited labeled samples, and propose a new framework, called {\em Shoestring}, that improves the learning performance through semantic transfer from these very few labeled samples to large numbers of unlabeled samples. In particular, our framework learns a metric space in which classification can be performed by computing the similarity to centroid embedding of each class. {\em Shoestring} is trained in an end-to-end fashion to learn to leverage the semantic knowledge of limited labeled samples as well as their connectivity patterns with large numbers of unlabeled samples simultaneously. By combining {\em Shoestring} with graph convolutional networks, label propagation and their recent label-efficient variations (IGCN and GLP), we are able to achieve state-of-the-art node classification performance in the presence of very few labeled samples. In addition, we demonstrate the effectiveness of our framework on image classification tasks in the few-shot learning regime, with significant gains on miniImageNet ($2.57\%\sim3.59\%$) and tieredImageNet ($1.05\%\sim2.70\%$).
DCJun 7, 2018
Semi-Dynamic Load Balancing: Efficient Distributed Learning in Non-Dedicated EnvironmentsChen Chen, Qizhen Weng, Wei Wang et al.
Machine learning (ML) models are increasingly trained in clusters with non-dedicated workers possessing heterogeneous resources. In such scenarios, model training efficiency can be negatively affected by stragglers -- workers that run much slower than others. Efficient model training requires eliminating such stragglers, yet for modern ML workloads, existing load balancing strategies are inefficient and even infeasible. In this paper, we propose a novel strategy called semi-dynamic load balancing to eliminate stragglers of distributed ML workloads. The key insight is that ML workers shall be load-balanced at iteration boundaries, being non-intrusive to intra-iteration execution. We develop LB-BSP based on such an insight, which is an integrated worker coordination mechanism that adapts workers' load to their instantaneous processing capabilities by right-sizing the sample batches at the synchronization barriers. We have custom-designed the batch sizing algorithm respectively for CPU and GPU clusters based on their own characteristics. LB-BSP has been implemented as a Python module for ML frameworks like TensorFlow and PyTorch. Our EC2 deployment confirms that LB-BSP is practical, effective and light-weight, and is able to accelerating distributed training by up to $54\%$.