CLDec 13, 2022
TencentPretrain: A Scalable and Flexible Toolkit for Pre-training Models of Different ModalitiesZhe Zhao, Yudong Li, Cheng Hou et al.
Recently, the success of pre-training in text domain has been fully extended to vision, audio, and cross-modal scenarios. The proposed pre-training models of different modalities are showing a rising trend of homogeneity in their model structures, which brings the opportunity to implement different pre-training models within a uniform framework. In this paper, we present TencentPretrain, a toolkit supporting pre-training models of different modalities. The core feature of TencentPretrain is the modular design. The toolkit uniformly divides pre-training models into 5 components: embedding, encoder, target embedding, decoder, and target. As almost all of common modules are provided in each component, users can choose the desired modules from different components to build a complete pre-training model. The modular design enables users to efficiently reproduce existing pre-training models or build brand-new one. We test the toolkit on text, vision, and audio benchmarks and show that it can match the performance of the original implementations.
CLOct 18, 2022
A Simple and Effective Method to Improve Zero-Shot Cross-Lingual Transfer LearningKunbo Ding, Weijie Liu, Yuejian Fang et al.
Existing zero-shot cross-lingual transfer methods rely on parallel corpora or bilingual dictionaries, which are expensive and impractical for low-resource languages. To disengage from these dependencies, researchers have explored training multilingual models on English-only resources and transferring them to low-resource languages. However, its effect is limited by the gap between embedding clusters of different languages. To address this issue, we propose Embedding-Push, Attention-Pull, and Robust targets to transfer English embeddings to virtual multilingual embeddings without semantic loss, thereby improving cross-lingual transferability. Experimental results on mBERT and XLM-R demonstrate that our method significantly outperforms previous works on the zero-shot cross-lingual text classification task and can obtain a better multilingual alignment.
LGSep 19, 2022
SAMP: A Model Inference Toolkit of Post-Training Quantization for Text Processing via Self-Adaptive Mixed-PrecisionRong Tian, Zijing Zhao, Weijie Liu et al.
The latest industrial inference engines, such as FasterTransformer and TurboTransformers, have verified that half-precision floating point (FP16) and 8-bit integer (INT8) quantization can greatly improve model inference speed. However, the existing INT8 quantization methods are too complicated, and improper usage will lead to model performance damage greatly. In this paper, we develop a toolkit for users to easily quantize their models for inference, in which Self-Adaptive Mixed-Precision (SAMP) is proposed to automatically control quantization rate by a mixed-precision architecture to balance model accuracy and efficiency. Experimental results show that our SAMP toolkit has a higher speedup than PyTorch and FasterTransformer while ensuring the required accuracy. In addition, SAMP is based on a modular design, decoupling the tokenizer, embedding, encoder and target layers, which allows users to handle various downstream tasks and can be seamlessly integrated into PyTorch.
CLFeb 10, 2025Code
Optimizing Knowledge Integration in Retrieval-Augmented Generation with Self-SelectionYan Weng, Fengbin Zhu, Tong Ye et al.
Retrieval-Augmented Generation (RAG), which integrates external knowledge into Large Language Models (LLMs), has proven effective in enabling LLMs to produce more accurate and reliable responses. However, it remains a significant challenge how to effectively integrate external retrieved knowledge with internal parametric knowledge in LLMs. In this work, we propose a novel Self-Selection RAG framework, where the LLM is made to select from pairwise responses generated with internal parametric knowledge solely and with external retrieved knowledge together to achieve enhanced accuracy. To this end, we devise a Self-Selection-RGP method to enhance the capabilities of the LLM in both generating and selecting the correct answer, by training the LLM with Direct Preference Optimization (DPO) over a curated Retrieval Generation Preference (RGP) dataset. Experimental results with two open-source LLMs (i.e., Llama2-13B-Chat and Mistral-7B) well demonstrate the superiority of our approach over other baseline methods on Natural Questions (NQ) and TrivialQA datasets.
IRApr 30
Position-Aware Drafting for Inference Acceleration in LLM-Based Generative List-Wise RecommendationJiaju Chen, Chongming Gao, Chenxiao Fan et al.
Large language model (LLM)-based generative list-wise recommendation has advanced rapidly, but decoding remains sequential and thus latency-prone. To accelerate inference without changing the target distribution, speculative decoding (SD) uses a small draft model to propose several next tokens at once and a target LLM to verify and accept the longest prefix, skipping multiple steps per round. In generative recommendation, however, each item is represented by multiple semantic-ID tokens, often with separators, and current drafts typically treat these tokens uniformly. This overlooks two practical facts: (i) a token's semantics depend on its within-item slot, and (ii) uncertainty tends to increase with speculation depth. Without modeling these effects, SD's speedups can be limited. We introduce PAD-Rec, Position-Aware Drafting for generative Recommendation, a lightweight module that augments the draft model with two complementary signals. Item position embeddings explicitly encode the within-item slot of each token, strengthening structural awareness. Step position embeddings encode the draft step, allowing the model to adapt to depth-dependent uncertainty and improve proposal quality. To harmonize these signals with base features, we add simple gates: a learnable coefficient for item slots and a context-driven gate for draft steps. The module is trainable, easy to integrate with standard draft models, and adds negligible inference overhead. Extensive experiments on four real-world datasets show up to 3.1x wall-clock speedup and about 5% average wall-clock speedup gain over strong SD baselines, while largely preserving recommendation quality.
CLSep 19, 2019
A Split-and-Recombine Approach for Follow-up Query AnalysisQian Liu, Bei Chen, Haoyan Liu et al.
Context-dependent semantic parsing has proven to be an important yet challenging task. To leverage the advances in context-independent semantic parsing, we propose to perform follow-up query analysis, aiming to restate context-dependent natural language queries with contextual information. To accomplish the task, we propose STAR, a novel approach with a well-designed two-phase process. It is parser-independent and able to handle multifarious follow-up scenarios in different domains. Experiments on the FollowUp dataset show that STAR outperforms the state-of-the-art baseline by a large margin of nearly 8%. The superiority on parsing results verifies the feasibility of follow-up query analysis. We also explore the extensibility of STAR on the SQA dataset, which is very promising.
SPMay 27, 2019
A Novel Demodulation and Estimation Algorithm for Blackout Communication: Extract Principal Components with Deep LearningHaoyan Liu, Yanming Liu, Ming Yang et al.
For reentry or near space communication, owing to the influence of the time-varying plasma sheath channel environment, the received IQ baseband signals are severely rotated on the constellation. Researches have shown that the frequency of electron density varies from 20kHz to 100 kHz which is on the same order as the symbol rate of most TT\&C communication systems and a mass of bandwidth will be consumed to track the time-varying channel with traditional estimation. In this paper, motivated by principal curve analysis, we propose a deep learning (DL) algorithm which called symmetric manifold network (SMN) to extract the curves on the constellation and classify the signals based on the curves. The key advantage is that SMN can achieve joint optimization of demodulation and channel estimation. From our simulation results, the new algorithm significantly reduces the symbol error rate (SER) compared to existing algorithms and enables accurate estimation of fading with extremely high bandwith utilization rate.
SIMay 28, 2018
r-Instance Learning for Missing People Tweets IdentificationYang Yang, Haoyan Liu, Xia Hu et al.
The number of missing people (i.e., people who get lost) greatly increases in recent years. It is a serious worldwide problem, and finding the missing people consumes a large amount of social resources. In tracking and finding these missing people, timely data gathering and analysis actually play an important role. With the development of social media, information about missing people can get propagated through the web very quickly, which provides a promising way to solve the problem. The information in online social media is usually of heterogeneous categories, involving both complex social interactions and textual data of diverse structures. Effective fusion of these different types of information for addressing the missing people identification problem can be a great challenge. Motivated by the multi-instance learning problem and existing social science theory of "homophily", in this paper, we propose a novel r-instance (RI) learning model.