3 Papers

62.2CVApr 14Code
Chain-of-Models Pre-Training: Rethinking Training Acceleration of Vision Foundation Models

Jiawei Fan, Shigeng Wang, Chao Li et al.

In this paper, we present Chain-of-Models Pre-Training (CoM-PT), a novel performance-lossless training acceleration method for vision foundation models (VFMs). This approach fundamentally differs from existing acceleration methods in its core motivation: rather than optimizing each model individually, CoM-PT is designed to accelerate the training pipeline at the model family level, scaling efficiently as the model family expands. Specifically, CoM-PT establishes a pre-training sequence for the model family, arranged in ascending order of model size, called model chain. In this chain, only the smallest model undergoes standard individual pre-training, while the other models are efficiently trained through sequential inverse knowledge transfer from their smaller predecessors by jointly reusing the knowledge in the parameter space and the feature space. As a result, CoM-PT enables all models to achieve performance that is mostly superior to standard individual training while significantly reducing training cost, and this is extensively validated across 45 datasets spanning zero-shot and fine-tuning tasks. Notably, its efficient scaling property yields a remarkable phenomenon: training more models even results in higher efficiency. For instance, when pre-training on CC3M: i) given ViT-L as the largest model, progressively prepending smaller models to the model chain reduces computational complexity by up to 72%; ii) within a fixed model size range, as the VFM family scales across 3, 4, and 7 models, the acceleration ratio of CoM-PT exhibits a striking leap: from 4.13X to 5.68X and 7.09X. Since CoM-PT is naturally agnostic to specific pre-training paradigms, we open-source the code to spur further extensions in more computationally intensive scenarios, such as large language model pre-training.

79.2AIMar 26
SliderQuant: Accurate Post-Training Quantization for LLMs

Shigeng Wang, Chao Li, Yangyuxuan Kang et al.

In this paper, we address post-training quantization (PTQ) for large language models (LLMs) from an overlooked perspective: given a pre-trained high-precision LLM, the predominant sequential quantization framework treats different layers equally, but this may be not optimal in challenging bit-width settings. We empirically study the quantization impact of different layers on model accuracy, and observe that: (1) shallow/deep layers are usually more sensitive to quantization than intermediate layers; (2) among shallow/deep layers, the most sensitive one is the first/last layer, which exhibits significantly larger quantization error than others. These empirical observations imply that the quantization design for different layers of LLMs is required on multiple levels instead of a single level shared to all layers. Motivated by this, we propose a new PTQ framework termed Sliding-layer Quantization (SliderQuant) that relies on a simple adaptive sliding quantization concept facilitated by few learnable parameters. The base component of SliderQuant is called inter-layer sliding quantization, which incorporates three types of novel sliding window designs tailored for addressing the varying quantization sensitivity of shallow, intermediate and deep layers. The other component is called intra-layer sliding quantization that leverages an incremental strategy to quantize each window. As a result, SliderQuant has a strong ability to reduce quantization errors across layers. Extensive experiments on basic language generation, zero-shot commonsense reasoning and challenging math and code tasks with various LLMs, including Llama/Llama2/Llama3/Qwen2.5 model families, DeepSeek-R1 distilled models and large MoE models, show that our method outperforms existing PTQ methods (including the latest PTQ methods using rotation transformations) for both weight-only quantization and weight-activation quantization.

61.1OCApr 19
Decentralized Stability-Constrained Optimal Power Flow for Inverter-Based Power Systems

Shigeng Wang, Sijia Geng

Future inverter-dominated power systems feature higher variability and more stressed operating conditions, which motivates the consideration of stability in operational settings. Existing approaches to stability-constrained OPF often rely on eigenvalue calculation, global model information, or dynamic evaluation inside optimization formulation, which are computationally intensive and difficult to scale. This paper proposes the first decentralized stability-constrained OPF framework for inverter-based power systems. The key novelty lies in the incorporation of a class of algebraic decentralized small-signal stability criteria that admits tractable representations in steady-state variables and is therefore suitable for optimization. The decentralized stability condition is based on local voltage differences and enables clear theoretical and practical economic interpretation of the stability contribution from each inverter. We define a Nodal Stability Shadow Price (NSSP) for each inverter, and characterize the role of these stability constraints through their associated shadow prices, enabling a nodal interpretation of their economic impacts. It is proved that under active-power-only objectives in lossless networks, binding stability constraints may occur but will admit zero shadow prices if all other operational constraints are inactive. Most importantly, we reveal the importance of considering the opportunity cost of reactive power for inverter-based resources (IBRs) that have limited capacity. When reactive power costs are considered, stability constraints can carry strictly positive shadow prices and admit meaningful economic impacts.