Jiamin Li

CL
h-index9
5papers
211citations
Novelty53%
AI Score45

5 Papers

CLOct 11, 2023
Adaptive Gating in Mixture-of-Experts based Language Models

Jiamin Li, Qiang Su, Yitao Yang et al.

Large language models, such as OpenAI's ChatGPT, have demonstrated exceptional language understanding capabilities in various NLP tasks. Sparsely activated mixture-of-experts (MoE) has emerged as a promising solution for scaling models while maintaining a constant number of computational operations. Existing MoE model adopts a fixed gating network where each token is computed by the same number of experts. However, this approach contradicts our intuition that the tokens in each sequence vary in terms of their linguistic complexity and, consequently, require different computational costs. Little is discussed in prior research on the trade-off between computation per token and model performance. This paper introduces adaptive gating in MoE, a flexible training strategy that allows tokens to be processed by a variable number of experts based on expert probability distribution. The proposed framework preserves sparsity while improving training efficiency. Additionally, curriculum learning is leveraged to further reduce training time. Extensive experiments on diverse NLP tasks show that adaptive gating reduces at most 22.5% training time while maintaining inference quality. Moreover, we conduct a comprehensive analysis of the routing decisions and present our insights when adaptive gating is used.

87.6ITApr 23
Spatiotemporal 2-D Polar Codes over Non-Uniform MIMO Channels: A Reliability-Aware Construction Approach

Yaqi Li, Shuohan Zhang, Xiaohu You et al.

With the increasing demand for ultra-reliable and low-latency communication (URLLC), spatiotemporal two-dimensional (2-D) channel coding has received growing interest. By leveraging the spatial degrees of freedom in massive multiple-input multiple-output (MIMO) systems, it shortens the time-domain blocklength, thereby reducing latency and enhancing reliability. However, existing spatiotemporal coding schemes typically assume uniform reliability across spatial streams. This assumption does not hold in practical MIMO channels, where the underlying propagation environment generally leads to unequal spatial-eigenmode gains and reliabilities, making the conventional Gaussian-approximation-based construction for 2-D polar codes less effective. This paper investigates spatiotemporal 2-D polar coding over non-uniform MIMO channels, where the spatial domain exhibits inherently heterogeneous signal-to-noise ratios (SNRs). We propose a reciprocal channel approximation (RCA)-based reliability-aware 2-D polar coding framework that accurately characterizes such heterogeneous SNRs without relying on log-likelihood-ratio distribution assumptions. Simulation results demonstrate that the proposed RCA-based spatiotemporal 2-D polar coding scheme achieves clear performance gains and strong robustness, confirming its effectiveness in jointly exploiting temporal and spatial polarization for URLLC in practical MIMO systems.

IVJul 31, 2025
Pixel Embedding Method for Tubular Neurite Segmentation

Huayu Fu, Jiamin Li, Haozhi Qu et al.

Automatic segmentation of neuronal topology is critical for handling large scale neuroimaging data, as it can greatly accelerate neuron annotation and analysis. However, the intricate morphology of neuronal branches and the occlusions among fibers pose significant challenges for deep learning based segmentation. To address these issues, we propose an improved framework: First, we introduce a deep network that outputs pixel level embedding vectors and design a corresponding loss function, enabling the learned features to effectively distinguish different neuronal connections within occluded regions. Second, building on this model, we develop an end to end pipeline that directly maps raw neuronal images to SWC formatted neuron structure trees. Finally, recognizing that existing evaluation metrics fail to fully capture segmentation accuracy, we propose a novel topological assessment metric to more appropriately quantify the quality of neuron segmentation and reconstruction. Experiments on our fMOST imaging dataset demonstrate that, compared to several classical methods, our approach significantly reduces the error rate in neuronal topology reconstruction.

DCFeb 16, 2022
Aryl: An Elastic Cluster Scheduler for Deep Learning

Jiamin Li, Hong Xu, Yibo Zhu et al.

Companies build separate training and inference GPU clusters for deep learning, and use separate schedulers to manage them. This leads to problems for both training and inference: inference clusters have low GPU utilization when the traffic load is low; training jobs often experience long queueing time due to lack of resources. We introduce Aryl, a new cluster scheduler to address these problems. Aryl introduces capacity loaning to loan idle inference GPU servers for training jobs. It further exploits elastic scaling that scales a training job's GPU allocation to better utilize loaned resources. Capacity loaning and elastic scaling create new challenges to cluster management. When the loaned servers need to be returned, we need to minimize the number of job preemptions; when more GPUs become available, we need to allocate them to elastic jobs and minimize the job completion time (JCT). Aryl addresses these combinatorial problems using principled heuristics. It introduces the notion of server preemption cost which it greedily reduces during server reclaiming. It further relies on the JCT reduction value defined for each additional worker for an elastic job to solve the scheduling problem as a multiple-choice knapsack problem. Prototype implementation on a 64-GPU testbed and large-scale simulation with 15-day traces of over 50,000 production jobs show that Aryl brings 1.53x and 1.50x reductions in average queuing time and JCT, and improves cluster usage by up to 26.9% over the cluster scheduler without capacity loaning or elastic scaling.

CRNov 1, 2020
Monitoring-based Differential Privacy Mechanism Against Query-Flooding Parameter Duplication Attack

Haonan Yan, Xiaoguang Li, Hui Li et al.

Public intelligent services enabled by machine learning algorithms are vulnerable to model extraction attacks that can steal confidential information of the learning models through public queries. Though there are some protection options such as differential privacy (DP) and monitoring, which are considered promising techniques to mitigate this attack, we still find that the vulnerability persists. In this paper, we propose an adaptive query-flooding parameter duplication (QPD) attack. The adversary can infer the model information with black-box access and no prior knowledge of any model parameters or training data via QPD. We also develop a defense strategy using DP called monitoring-based DP (MDP) against this new attack. In MDP, we first propose a novel real-time model extraction status assessment scheme called Monitor to evaluate the situation of the model. Then, we design a method to guide the differential privacy budget allocation called APBA adaptively. Finally, all DP-based defenses with MDP could dynamically adjust the amount of noise added in the model response according to the result from Monitor and effectively defends the QPD attack. Furthermore, we thoroughly evaluate and compare the QPD attack and MDP defense performance on real-world models with DP and monitoring protection.