LGJan 30
SPICE: Submodular Penalized Information-Conflict Selection for Efficient Large Language Model TrainingPowei Chang, Jinpeng Zhang, Bowen Chen et al.
Information-based data selection for instruction tuning is compelling: maximizing the log-determinant of the Fisher information yields a monotone submodular objective, enabling greedy algorithms to achieve a $(1-1/e)$ approximation under a cardinality budget. In practice, however, we identify alleviating gradient conflicts, misalignment between per-sample gradients, is a key factor that slows down the decay of marginal log-determinant information gains, thereby preventing significant loss of information. We formalize this via an $\varepsilon$-decomposition that quantifies the deviation from ideal submodularity as a function of conflict statistics, yielding data-dependent approximation factors that tighten as conflicts diminish. Guided by this analysis, we propose SPICE, a conflict-aware selector that maximizes information while penalizing misalignment, and that supports early stopping and proxy models for efficiency. Empirically, SPICE selects subsets with higher log-determinant information than original criteria, and these informational gains translate into performance improvements: across 8 benchmarks with LLaMA2-7B and Qwen2-7B, SPICE uses only 10% of the data, yet matches or exceeds 6 methods including full-data tuning. This achieves performance improvements with substantially lower training cost.
HCMar 1
The Evolving Duet of Two Modalities: A Survey on Integrating Text and Visualization for Data CommunicationXingyu Lan, Xi Li, Yixing Zhang et al.
Text plays a fundamental yet understudied role as a narrative device in data visualization. While existing research has extensively explored text as data input and interaction modality, its function in supporting storytelling and interpretation remains fragmented. To address this gap, this work presents a systematic review of 98 publications that provide insights into using text as narrative. We investigate how text can be utilized in visualization, analyze its functions and effects, and explore how it can be designed to facilitate data communication. Our synthesis identifies significant research gaps in this domain and proposes future directions to advance the integration of text and visualization, ultimately aiming to provide guidance for designing text that enhances narrative clarity and fosters engagement.
53.8ARApr 5
3D-Stacked NMP, LLM Decoding, Systolic Array Microarchitecture, Multi-Core SchedulingChenyang Ai, Yixing Zhang, Haoran Wu et al.
Large language model (LLM) decoding is a major inference bottleneck because its low arithmetic intensity makes performance highly sensitive to memory bandwidth. 3D-stacked near-memory processing (NMP) provides substantially higher local memory bandwidth than conventional off-chip interfaces, making it a promising substrate for decode acceleration. However, our analysis shows that this bandwidth advantage also shifts many decode operators on 3D-stacked NMP back into the compute-bound regime. Under the tight area budget of the logic die, the design of the compute substrate itself therefore becomes a first-order challenge. Therefore, we rethink the compute microarchitecture of prior 3D-stacked NMP designs. First, we replace prior MAC tree-based compute units with a more area-efficient systolic array, and we further observe that decode operators exhibit substantial shape diversity, making reconfigurability in both systolic array shape and dataflow essential for sustaining high utilization. Building on this insight, we continue to exploit two key opportunities: the high local memory bandwidth reduces the need for large on-chip buffers, and the existing vector core, originally designed to handle auxiliary tensor computations, already provides much of the control logic and multi-ported buffering required for fine-grained flexibility for systolic array, allowing us to unify the two structures in a highly area-efficient manner. Based on these insights, we present the first compute microarchitecture tailored to 3D-stacked NMP LLM decoding, explicitly designed to satisfy the joint requirements of low area cost, high-bandwidth operation, and fine-grained reconfigurability. We further propose an multi-core scheduling framework. Compared with Stratum, our design achieves an average 2.91x speedup and 2.40x higher energy efficiency across both dense and MoE models.
AIApr 14, 2021
Identification of mental fatigue in language comprehension tasks based on EEG and deep learningChunhua Ye, Zhong Yin, Chenxi Wu et al.
Mental fatigue increases the risk of operator error in language comprehension tasks. In order to prevent operator performance degradation, we used EEG signals to assess the mental fatigue of operators in human-computer systems. This study presents an experimental design for fatigue detection in language comprehension tasks. We obtained EEG signals from a 14-channel wireless EEG detector in 15 healthy participants. Each participant was given a cognitive test of a language comprehension task, in the form of multiple choice questions, in which pronoun references were selected between nominal and surrogate sentences. In this paper, the 2400 EEG fragments collected are divided into three data sets according to different utilization rates, namely 1200s data set with 50% utilization rate, 1500s data set with 62.5% utilization rate, and 1800s data set with 75% utilization rate. In the aspect of feature extraction, different EEG features were extracted, including time domain features, frequency domain features and entropy features, and the effects of different features and feature combinations on classification accuracy were explored. In terms of classification, we introduced the Convolutional Neural Network (CNN) method as the preferred method, It was compared with Least Squares Support Vector Machines(LSSVM),Support Vector Machines(SVM),Logistic Regression (LR), Random Forest(RF), Naive Bayes (NB), K-Nearest Neighbor (KNN) and Decision Tree(DT).According to the results, the classification accuracy of convolutional neural network (CNN) is higher than that of other classification methods. The classification results show that the classification accuracy of 1200S dataset is higher than the other two datasets. The combination of Frequency and entropy feature and CNN has the highest classification accuracy, which is 85.34%.
LGFeb 28, 2021
Convergence of Gaussian-smoothed optimal transport distance with sub-gamma distributions and dependent samplesYixing Zhang, Xiuyuan Cheng, Galen Reeves
The Gaussian-smoothed optimal transport (GOT) framework, recently proposed by Goldfeld et al., scales to high dimensions in estimation and provides an alternative to entropy regularization. This paper provides convergence guarantees for estimating the GOT distance under more general settings. For the Gaussian-smoothed $p$-Wasserstein distance in $d$ dimensions, our results require only the existence of a moment greater than $d + 2p$. For the special case of sub-gamma distributions, we quantify the dependence on the dimension $d$ and establish a phase transition with respect to the scale parameter. We also prove convergence for dependent samples, only requiring a condition on the pairwise dependence of the samples measured by the covariance of the feature map of a kernel space. A key step in our analysis is to show that the GOT distance is dominated by a family of kernel maximum mean discrepancy (MMD) distances with a kernel that depends on the cost function as well as the amount of Gaussian smoothing. This insight provides further interpretability for the GOT framework and also introduces a class of kernel MMD distances with desirable properties. The theoretical results are supported by numerical experiments.
AIJul 8, 2020
NASGEM: Neural Architecture Search via Graph Embedding MethodHsin-Pai Cheng, Tunhou Zhang, Yixing Zhang et al.
Neural Architecture Search (NAS) automates and prospers the design of neural networks. Estimator-based NAS has been proposed recently to model the relationship between architectures and their performance to enable scalable and flexible search. However, existing estimator-based methods encode the architecture into a latent space without considering graph similarity. Ignoring graph similarity in node-based search space may induce a large inconsistency between similar graphs and their distance in the continuous encoding space, leading to inaccurate encoding representation and/or reduced representation capacity that can yield sub-optimal search results. To preserve graph correlation information in encoding, we propose NASGEM which stands for Neural Architecture Search via Graph Embedding Method. NASGEM is driven by a novel graph embedding method equipped with similarity measures to capture the graph topology information. By precisely estimating the graph distance and using an auxiliary Weisfeiler-Lehman kernel to guide the encoding, NASGEM can utilize additional structural information to get more accurate graph representation to improve the search efficiency. GEMNet, a set of networks discovered by NASGEM, consistently outperforms networks crafted by existing search methods in classification tasks, i.e., with 0.4%-3.6% higher accuracy while having 11%- 21% fewer Multiply-Accumulates. We further transfer GEMNet for COCO object detection. In both one-stage and twostage detectors, our GEMNet surpasses its manually-crafted and automatically-searched counterparts.