7.1LGJul 9, 2025Code
CCQ: Convolutional Code for Extreme Low-bit Quantization in LLMsZhaojing Zhou, Xunchao Li, Minghao Li et al.
The rapid scaling of Large Language Models (LLMs) elevates inference costs and compounds substantial deployment barriers. While quantization to 8 or 4 bits mitigates this, sub-3-bit methods face severe accuracy, scalability, and efficiency degradation. We propose Convolutional Code Quantization (CCQ), an inference-optimized quantization approach compressing LLMs to 2.0-2.75 bits with minimal accuracy loss. Departing from error-prone scalar quantization or slow vector quantization, CCQ integrates a hardware-aware bit-shift encoding and decoding solution with Convolutional Code, Hybrid Encoding, and Code Cluster, jointly overcoming accuracy-speed bottlenecks. We construct a lookup-free encoding space, enabling a linear mapping between the codebook and weight vectors, thereby optimizing inference performance. Meanwhile, by drawing on the concept of data mapping from vector quantization, we minimize the performance degradation of the model under extremely low-bit conditions. Experiments demonstrate that CCQ achieves outstanding performance on LLMs across various benchmarks. We compress DeepSeek-V3 (671B total parameters) to 184GB and ERNIE-4.5-300B-A47B to 89GB, enabling single-GPU deployment of ERNIE 4.5 and eliminating inter-card communication. The 2-bit ERNIE-4.5-300B-A47B model and inference engine have been open-sourced.
SelfGait: A Spatiotemporal Representation Learning Method for Self-supervised Gait RecognitionYiqun Liu, Yi Zeng, Jian Pu et al.
Gait recognition plays a vital role in human identification since gait is a unique biometric feature that can be perceived at a distance. Although existing gait recognition methods can learn gait features from gait sequences in different ways, the performance of gait recognition suffers from insufficient labeled data, especially in some practical scenarios associated with short gait sequences or various clothing styles. It is unpractical to label the numerous gait data. In this work, we propose a self-supervised gait recognition method, termed SelfGait, which takes advantage of the massive, diverse, unlabeled gait data as a pre-training process to improve the representation abilities of spatiotemporal backbones. Specifically, we employ the horizontal pyramid mapping (HPM) and micro-motion template builder (MTB) as our spatiotemporal backbones to capture the multi-scale spatiotemporal representations. Experiments on CASIA-B and OU-MVLP benchmark gait datasets demonstrate the effectiveness of the proposed SelfGait compared with four state-of-the-art gait recognition methods. The source code has been released at https://github.com/EchoItLiu/SelfGait.
11.5CLMar 27, 2024
BLADE: Enhancing Black-box Large Language Models with Small Domain-Specific ModelsHaitao Li, Qingyao Ai, Jia Chen et al. · tsinghua
Large Language Models (LLMs) like ChatGPT and GPT-4 are versatile and capable of addressing a diverse range of tasks. However, general LLMs, which are developed on open-domain data, may lack the domain-specific knowledge essential for tasks in vertical domains, such as legal, medical, etc. To address this issue, previous approaches either conduct continuous pre-training with domain-specific data or employ retrieval augmentation to support general LLMs. Unfortunately, these strategies are either cost-intensive or unreliable in practical applications. To this end, we present a novel framework named BLADE, which enhances Black-box LArge language models with small Domain-spEcific models. BLADE consists of a black-box LLM and a small domain-specific LM. The small LM preserves domain-specific knowledge and offers specialized insights, while the general LLM contributes robust language comprehension and reasoning capabilities. Specifically, our method involves three steps: 1) pre-training the small LM with domain-specific data, 2) fine-tuning this model using knowledge instruction data, and 3) joint Bayesian optimization of the general LLM and the small LM. Extensive experiments conducted on public legal and medical benchmarks reveal that BLADE significantly outperforms existing approaches. This shows the potential of BLADE as an effective and cost-efficient solution in adapting general LLMs for vertical domains.
19.8IRFeb 20, 2025
External Large Foundation Model: How to Efficiently Serve Trillions of Parameters for Online Ads RecommendationMingfu Liang, Xi Liu, Rong Jin et al.
Ads recommendation is a prominent service of online advertising systems and has been actively studied. Recent studies indicate that scaling-up and advanced design of the recommendation model can bring significant performance improvement. However, with a larger model scale, such prior studies have a significantly increasing gap from industry as they often neglect two fundamental challenges in industrial-scale applications. First, training and inference budgets are restricted for the model to be served, exceeding which may incur latency and impair user experience. Second, large-volume data arrive in a streaming mode with data distributions dynamically shifting, as new users/ads join and existing users/ads leave the system. We propose the External Large Foundation Model (ExFM) framework to address the overlooked challenges. Specifically, we develop external distillation and a data augmentation system (DAS) to control the computational cost of training/inference while maintaining high performance. We design the teacher in a way like a foundation model (FM) that can serve multiple students as vertical models (VMs) to amortize its building cost. We propose Auxiliary Head and Student Adapter to mitigate the data distribution gap between FM and VMs caused by the streaming data issue. Comprehensive experiments on internal industrial-scale applications and public datasets demonstrate significant performance gain by ExFM.
CrossPT-EEG: A Benchmark for Cross-Participant and Cross-Time Generalization of EEG-based Visual DecodingShuqi Zhu, Ziyi Ye, Qingyao Ai et al.
Exploring brain activity in relation to visual perception provides insights into the biological representation of the world. While functional magnetic resonance imaging (fMRI) and magnetoencephalography (MEG) have enabled effective image classification and reconstruction, their high cost and bulk limit practical use. Electroencephalography (EEG), by contrast, offers low cost and excellent temporal resolution, but its potential has been limited by the scarcity of large, high-quality datasets and by block-design experiments that introduce temporal confounds. To fill this gap, we present CrossPT-EEG, a benchmark for cross-participant and cross-time generalization of visual decoding from EEG. We collected EEG data from 16 participants while they viewed 4,000 images sampled from ImageNet, with image stimuli annotated at multiple levels of granularity. Our design includes two stages separated in time to allow cross-time generalization and avoid block-design artifacts. We also introduce benchmarks tailored to non-block design classification, as well as pre-training experiments to assess cross-time and cross-participant generalization. These findings highlight the dataset's potential to enhance EEG-based visual brain-computer interfaces, deepen our understanding of visual perception in biological systems, and suggest promising applications for improving machine vision models.
RepBERT: Contextualized Text Embeddings for First-Stage RetrievalJingtao Zhan, Jiaxin Mao, Yiqun Liu et al.
Although exact term match between queries and documents is the dominant method to perform first-stage retrieval, we propose a different approach, called RepBERT, to represent documents and queries with fixed-length contextualized embeddings. The inner products of query and document embeddings are regarded as relevance scores. On MS MARCO Passage Ranking task, RepBERT achieves state-of-the-art results among all initial retrieval techniques. And its efficiency is comparable to bag-of-words methods.
1.2LGApr 13, 2020
STAS: Adaptive Selecting Spatio-Temporal Deep Features for Improving Bias Correction on PrecipitationYiqun Liu, Shouzhen Chen, Lei Chen et al.
Numerical Weather Prediction (NWP) can reduce human suffering by predicting disastrous precipitation in time. A commonly-used NWP in the world is the European Centre for medium-range weather forecasts (EC). However, it is necessary to correct EC forecast through Bias Correcting on Precipitation (BCoP) since we still have not fully understood the mechanism of precipitation, making EC often have some biases. The existing BCoPs suffers from limited prior data and the fixed Spatio-Temporal (ST) scale. We thus propose an end-to-end deep-learning BCoP model named Spatio-Temporal feature Auto-Selective (STAS) model to select optimal ST regularity from EC via the ST Feature-selective Mechanisms (SFM/TFM). Given different input features, these two mechanisms can automatically adjust the spatial and temporal scales for correcting. Experiments on an EC public dataset indicate that compared with 8 published BCoP methods, STAS shows state-of-the-art performance on several criteria of BCoP, named threat scores (TS). Further, ablation studies justify that the SFM/TFM indeed work well in boosting the performance of BCoP, especially on the heavy precipitation.
Temporal Relational Ranking for Stock PredictionFuli Feng, Xiangnan He, Xiang Wang et al.
Stock prediction aims to predict the future trends of a stock in order to help investors to make good investment decisions. Traditional solutions for stock prediction are based on time-series models. With the recent success of deep neural networks in modeling sequential data, deep learning has become a promising choice for stock prediction. However, most existing deep learning solutions are not optimized towards the target of investment, i.e., selecting the best stock with the highest expected revenue. Specifically, they typically formulate stock prediction as a classification (to predict stock trend) or a regression problem (to predict stock price). More importantly, they largely treat the stocks as independent of each other. The valuable signal in the rich relations between stocks (or companies), such as two stocks are in the same sector and two companies have a supplier-customer relation, is not considered. In this work, we contribute a new deep learning solution, named Relational Stock Ranking (RSR), for stock prediction. Our RSR method advances existing solutions in two major aspects: 1) tailoring the deep learning models for stock ranking, and 2) capturing the stock relations in a time-sensitive manner. The key novelty of our work is the proposal of a new component in neural network modeling, named Temporal Graph Convolution, which jointly models the temporal evolution and relation network of stocks. To validate our method, we perform back-testing on the historical data of two stock markets, NYSE and NASDAQ. Extensive experiments demonstrate the superiority of our RSR method. It outperforms state-of-the-art stock prediction solutions achieving an average return ratio of 98% and 71% on NYSE and NASDAQ, respectively.
3.1CLFeb 11, 2015
Boost Phrase-level Polarity Labelling with Review-level Sentiment ClassificationYongfeng Zhang, Min Zhang, Yiqun Liu et al.
Sentiment analysis on user reviews helps to keep track of user reactions towards products, and make advices to users about what to buy. State-of-the-art review-level sentiment classification techniques could give pretty good precisions of above 90%. However, current phrase-level sentiment analysis approaches might only give sentiment polarity labelling precisions of around 70%~80%, which is far from satisfaction and restricts its application in many practical tasks. In this paper, we focus on the problem of phrase-level sentiment polarity labelling and attempt to bridge the gap between phrase-level and review-level sentiment analysis. We investigate the inconsistency between the numerical star ratings and the sentiment orientation of textual user reviews. Although they have long been treated as identical, which serves as a basic assumption in previous work, we find that this assumption is not necessarily true. We further propose to leverage the results of review-level sentiment classification to boost the performance of phrase-level polarity labelling using a novel constrained convex optimization framework. Besides, the framework is capable of integrating various kinds of information sources and heuristics, while giving the global optimal solution due to its convexity. Experimental results on both English and Chinese reviews show that our framework achieves high labelling precisions of up to 89%, which is a significant improvement from current approaches.