Linqi Song

CL
h-index26
50papers
5,361citations
Novelty52%
AI Score45

50 Papers

23.2CLOct 5, 2023Code
MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical Reasoning

Ke Wang, Houxing Ren, Aojun Zhou et al.

The recently released GPT-4 Code Interpreter has demonstrated remarkable proficiency in solving challenging math problems, primarily attributed to its ability to seamlessly reason with natural language, generate code, execute code, and continue reasoning based on the execution output. In this paper, we present a method to fine-tune open-source language models, enabling them to use code for modeling and deriving math equations and, consequently, enhancing their mathematical reasoning abilities. We propose a method of generating novel and high-quality datasets with math problems and their code-based solutions, referred to as MathCodeInstruct. Each solution interleaves natural language, code, and execution results. We also introduce a customized supervised fine-tuning and inference approach. This approach yields the MathCoder models, a family of models capable of generating code-based solutions for solving challenging math problems. Impressively, the MathCoder models achieve state-of-the-art scores among open-source LLMs on the MATH (45.2%) and GSM8K (83.9%) datasets, substantially outperforming other open-source alternatives. Notably, the MathCoder model not only surpasses ChatGPT-3.5 and PaLM-2 on GSM8K and MATH but also outperforms GPT-4 on the competition-level MATH dataset. The dataset and models will be released at https://github.com/mathllm/MathCoder.

7.6CVFeb 22, 2023Code
Gap-closing Matters: Perceptual Quality Evaluation and Optimization of Low-Light Image Enhancement

Baoliang Chen, Lingyu Zhu, Hanwei Zhu et al.

There is a growing consensus in the research community that the optimization of low-light image enhancement approaches should be guided by the visual quality perceived by end users. Despite the substantial efforts invested in the design of low-light enhancement algorithms, there has been comparatively limited focus on assessing subjective and objective quality systematically. To mitigate this gap and provide a clear path towards optimizing low-light image enhancement for better visual quality, we propose a gap-closing framework. In particular, our gap-closing framework starts with the creation of a large-scale dataset for Subjective QUality Assessment of REconstructed LOw-Light Images (SQUARE-LOL). This database serves as the foundation for studying the quality of enhanced images and conducting a comprehensive subjective user study. Subsequently, we propose an objective quality assessment measure that plays a critical role in bridging the gap between visual quality and enhancement. Finally, we demonstrate that our proposed objective quality measure can be incorporated into the process of optimizing the learning of the enhancement model toward perceptual optimality. We validate the effectiveness of our proposed framework through both the accuracy of quality prediction and the perceptual quality of image enhancement. Our database and codes are publicly available at https://github.com/Baoliang93/IACA_For_Lowlight_IQA.

10.8NEApr 17, 2023Code
A Scalable Test Problem Generator for Sequential Transfer Optimization

Xiaoming Xue, Cuie Yang, Liang Feng et al.

Sequential transfer optimization (STO), which aims to improve the optimization performance on a task of interest by exploiting the knowledge captured from several previously-solved optimization tasks stored in a database, has been gaining increasing research attention over the years. However, despite the remarkable advances in algorithm design, the development of a systematic benchmark suite for comprehensive comparisons of STO algorithms received far less attention. Existing test problems are either simply generated by assembling other benchmark functions or extended from specific practical problems with limited scalability. The relationships between the optimal solutions of the source and target tasks in these problems are also often manually configured, limiting their ability to model different similarity relationships presented in real-world problems. Consequently, the good performance achieved by an algorithm on these problems might be biased and hard to be generalized to other problems. In light of the above, in this study, we first introduce four concepts for characterizing STO problems and present an important problem feature, namely similarity distribution, which quantitatively delineates the relationship between the optima of the source and target tasks. Then, we present the general design guidelines of STO problems and a particular STO problem generator with good scalability. Specifically, the similarity distribution of a problem can be easily customized, enabling a continuous spectrum of representation of the diverse similarity relationships of real-world problems. Lastly, a benchmark suite with 12 STO problems featured by a variety of customized similarity relationships is developed using the proposed generator. The source code of the problem generator is available at https://github.com/XmingHsueh/STOP-G.

11.3CROct 5, 2022
Hiding Images in Deep Probabilistic Models

Haoyu Chen, Linqi Song, Zhenxing Qian et al.

Data hiding with deep neural networks (DNNs) has experienced impressive successes in recent years. A prevailing scheme is to train an autoencoder, consisting of an encoding network to embed (or transform) secret messages in (or into) a carrier, and a decoding network to extract the hidden messages. This scheme may suffer from several limitations regarding practicability, security, and embedding capacity. In this work, we describe a different computational framework to hide images in deep probabilistic models. Specifically, we use a DNN to model the probability density of cover images, and hide a secret image in one particular location of the learned distribution. As an instantiation, we adopt a SinGAN, a pyramid of generative adversarial networks (GANs), to learn the patch distribution of one cover image. We hide the secret image by fitting a deterministic mapping from a fixed set of noise maps (generated by an embedding key) to the secret image during patch distribution learning. The stego SinGAN, behaving as the original SinGAN, is publicly communicated; only the receiver with the embedding key is able to extract the secret image. We demonstrate the feasibility of our SinGAN approach in terms of extraction accuracy and model security. Moreover, we show the flexibility of the proposed method in terms of hiding multiple images for different receivers and obfuscating the secret image.

23.7CLAug 15, 2023
Solving Challenging Math Word Problems Using GPT-4 Code Interpreter with Code-based Self-Verification

Aojun Zhou, Ke Wang, Zimu Lu et al.

Recent progress in large language models (LLMs) like GPT-4 and PaLM-2 has brought significant advancements in addressing math reasoning problems. In particular, OpenAI's latest version of GPT-4, known as GPT-4 Code Interpreter, shows remarkable performance on challenging math datasets. In this paper, we explore the effect of code on enhancing LLMs' reasoning capability by introducing different constraints on the \textit{Code Usage Frequency} of GPT-4 Code Interpreter. We found that its success can be largely attributed to its powerful skills in generating and executing code, evaluating the output of code execution, and rectifying its solution when receiving unreasonable outputs. Based on this insight, we propose a novel and effective prompting method, explicit \uline{c}ode-based \uline{s}elf-\uline{v}erification~(CSV), to further boost the mathematical reasoning potential of GPT-4 Code Interpreter. This method employs a zero-shot prompt on GPT-4 Code Interpreter to encourage it to use code to self-verify its answers. In instances where the verification state registers as ``False'', the model shall automatically amend its solution, analogous to our approach of rectifying errors during a mathematics examination. Furthermore, we recognize that the states of the verification result indicate the confidence of a solution, which can improve the effectiveness of majority voting. With GPT-4 Code Interpreter and CSV, we achieve an impressive zero-shot accuracy on MATH dataset \textbf{(53.9\% $\to$ 84.3\%)}.

31.7CLApr 11, 2022Code
Zero-shot Cross-lingual Conversational Semantic Role Labeling

Han Wu, Haochen Tan, Kun Xu et al. · tencent-ai

While conversational semantic role labeling (CSRL) has shown its usefulness on Chinese conversational tasks, it is still under-explored in non-Chinese languages due to the lack of multilingual CSRL annotations for the parser training. To avoid expensive data collection and error-propagation of translation-based methods, we present a simple but effective approach to perform zero-shot cross-lingual CSRL. Our model implicitly learns language-agnostic, conversational structure-aware and semantically rich representations with the hierarchical encoders and elaborately designed pre-training objectives. Experimental results show that our model outperforms all baselines by large margins on two newly collected English CSRL test sets. More importantly, we confirm the usefulness of CSRL to non-Chinese conversational tasks such as the question-in-context rewriting task in English and the multi-turn dialogue response generation tasks in English, German and Japanese by incorporating the CSRL information into the downstream conversation-based models. We believe this finding is significant and will facilitate the research of non-Chinese dialogue tasks which suffer the problems of ellipsis and anaphora.

32.0CLMar 11, 2022Code
A Sentence is Worth 128 Pseudo Tokens: A Semantic-Aware Contrastive Learning Framework for Sentence Embeddings

Haochen Tan, Wei Shao, Han Wu et al.

Contrastive learning has shown great potential in unsupervised sentence embedding tasks, e.g., SimCSE. However, We find that these existing solutions are heavily affected by superficial features like the length of sentences or syntactic structures. In this paper, we propose a semantics-aware contrastive learning framework for sentence embeddings, termed Pseudo-Token BERT (PT-BERT), which is able to exploit the pseudo-token space (i.e., latent semantic space) representation of a sentence while eliminating the impact of superficial features such as sentence length and syntax. Specifically, we introduce an additional pseudo token embedding layer independent of the BERT encoder to map each sentence into a sequence of pseudo tokens in a fixed length. Leveraging these pseudo sequences, we are able to construct same-length positive and negative pairs based on the attention mechanism to perform contrastive learning. In addition, we utilize both the gradient-updating and momentum-updating encoders to encode instances while dynamically maintaining an additional queue to store the representation of sentence embeddings, enhancing the encoder's learning performance for negative examples. Experiments show that our model outperforms the state-of-the-art baselines on six standard semantic textual similarity (STS) tasks. Furthermore, experiments on alignments and uniformity losses, as well as hard examples with different sentence lengths and syntax, consistently verify the effectiveness of our method.

7.7LGApr 26, 2023
Killing Two Birds with One Stone: Quantization Achieves Privacy in Distributed Learning

Guangfeng Yan, Tan Li, Kui Wu et al.

Communication efficiency and privacy protection are two critical issues in distributed machine learning. Existing methods tackle these two issues separately and may have a high implementation complexity that constrains their application in a resource-limited environment. We propose a comprehensive quantization-based solution that could simultaneously achieve communication efficiency and privacy protection, providing new insights into the correlated nature of communication and privacy. Specifically, we demonstrate the effectiveness of our proposed solutions in the distributed stochastic gradient descent (SGD) framework by adding binomial noise to the uniformly quantized gradients to reach the desired differential privacy level but with a minor sacrifice in communication efficiency. We theoretically capture the new trade-offs between communication, privacy, and learning performance.

0.6CLMay 29, 2022Code
Learning Locality and Isotropy in Dialogue Modeling

Han Wu, Haochen Tan, Mingjie Zhan et al.

Existing dialogue modeling methods have achieved promising performance on various dialogue tasks with the aid of Transformer and the large-scale pre-trained language models. However, some recent studies revealed that the context representations produced by these methods suffer the problem of anisotropy. In this paper, we find that the generated representations are also not conversational, losing the conversation structure information during the context modeling stage. To this end, we identify two properties in dialogue modeling, i.e., locality and isotropy, and present a simple method for dialogue representation calibration, namely SimDRC, to build isotropic and conversational feature spaces. Experimental results show that our approach significantly outperforms the current state-of-the-art models on three dialogue tasks across the automatic and human evaluation metrics. More in-depth analyses further confirm the effectiveness of our proposed approach.

1.8LGOct 24, 2022
Adaptive Top-K in SGD for Communication-Efficient Distributed Learning

Mengzhe Ruan, Guangfeng Yan, Yuanzhang Xiao et al.

Distributed stochastic gradient descent (SGD) with gradient compression has become a popular communication-efficient solution for accelerating distributed learning. One commonly used method for gradient compression is Top-K sparsification, which sparsifies the gradients by a fixed degree during model training. However, there has been a lack of an adaptive approach to adjust the sparsification degree to maximize the potential of the model's performance or training speed. This paper proposes a novel adaptive Top-K in SGD framework that enables an adaptive degree of sparsification for each gradient descent step to optimize the convergence performance by balancing the trade-off between communication cost and convergence error. Firstly, an upper bound of convergence error is derived for the adaptive sparsification scheme and the loss function. Secondly, an algorithm is designed to minimize the convergence error under the communication cost constraints. Finally, numerical results on the MNIST and CIFAR-10 datasets demonstrate that the proposed adaptive Top-K algorithm in SGD achieves a significantly better convergence rate compared to state-of-the-art methods, even after considering error compensation.

3.7CVJul 19, 2024
OCTrack: Benchmarking the Open-Corpus Multi-Object Tracking

Zekun Qian, Ruize Han, Wei Feng et al.

We study a novel yet practical problem of open-corpus multi-object tracking (OCMOT), which extends the MOT into localizing, associating, and recognizing generic-category objects of both seen (base) and unseen (novel) classes, but without the category text list as prompt. To study this problem, the top priority is to build a benchmark. In this work, we build OCTrackB, a large-scale and comprehensive benchmark, to provide a standard evaluation platform for the OCMOT problem. Compared to previous datasets, OCTrackB has more abundant and balanced base/novel classes and the corresponding samples for evaluation with less bias. We also propose a new multi-granularity recognition metric to better evaluate the generative object recognition in OCMOT. By conducting the extensive benchmark evaluation, we report and analyze the results of various state-of-the-art methods, which demonstrate the rationale of OCMOT, as well as the usefulness and advantages of OCTrackB.

0.6CLJun 16, 2022
Towards Better Understanding with Uniformity and Explicit Regularization of Embeddings in Embedding-based Neural Topic Models

Wei Shao, Lei Huang, Shuqi Liu et al.

Embedding-based neural topic models could explicitly represent words and topics by embedding them to a homogeneous feature space, which shows higher interpretability. However, there are no explicit constraints for the training of embeddings, leading to a larger optimization space. Also, a clear description of the changes in embeddings and the impact on model performance is still lacking. In this paper, we propose an embedding regularized neural topic model, which applies the specially designed training constraints on word embedding and topic embedding to reduce the optimization space of parameters. To reveal the changes and roles of embeddings, we introduce \textbf{uniformity} into the embedding-based neural topic model as the evaluation metric of embedding space. On this basis, we describe how embeddings tend to change during training via the changes in the uniformity of embeddings. Furthermore, we demonstrate the impact of changes in embeddings in embedding-based neural topic models through ablation studies. The results of experiments on two mainstream datasets indicate that our model significantly outperforms baseline models in terms of the harmony between topic quality and document modeling. This work is the first attempt to exploit uniformity to explore changes in embeddings of embedding-based neural topic models and their impact on model performance to the best of our knowledge.

4.2CLMar 10, 2024Code
Can LLM Substitute Human Labeling? A Case Study of Fine-grained Chinese Address Entity Recognition Dataset for UAV Delivery

Yuxuan Yao, Sichun Luo, Haohan Zhao et al.

We present CNER-UAV, a fine-grained \textbf{C}hinese \textbf{N}ame \textbf{E}ntity \textbf{R}ecognition dataset specifically designed for the task of address resolution in \textbf{U}nmanned \textbf{A}erial \textbf{V}ehicle delivery systems. The dataset encompasses a diverse range of five categories, enabling comprehensive training and evaluation of NER models. To construct this dataset, we sourced the data from a real-world UAV delivery system and conducted a rigorous data cleaning and desensitization process to ensure privacy and data integrity. The resulting dataset, consisting of around 12,000 annotated samples, underwent human experts and \textbf{L}arge \textbf{L}anguage \textbf{M}odel annotation. We evaluated classical NER models on our dataset and provided in-depth analysis. The dataset and models are publicly available at \url{https://github.com/zhhvvv/CNER-UAV}.

22.6LGFeb 16, 2025Code
Unlocking the Power of Function Vectors for Characterizing and Mitigating Catastrophic Forgetting in Continual Instruction Tuning

Gangwei Jiang, Caigao Jiang, Zhaoyi Li et al.

Catastrophic forgetting (CF) poses a significant challenge in machine learning, where a model forgets previously learned information upon learning new tasks. Despite the advanced capabilities of Large Language Models (LLMs), they continue to face challenges with CF during continual learning. The majority of existing research focuses on analyzing forgetting patterns through a singular training sequence, thereby overlooking the intricate effects that diverse tasks have on model behavior. Our study explores CF across various settings, discovering that model forgetting is influenced by both the specific training tasks and the models themselves. To this end, we interpret forgetting by examining the function vector (FV), a compact representation of functions in LLMs, offering a model-dependent indicator for the occurrence of CF. Through theoretical and empirical analyses, we demonstrated that CF in LLMs primarily stems from biases in function activation rather than the overwriting of task processing functions. Leveraging these insights, we propose a novel function vector guided training methodology, incorporating a regularization technique to stabilize the FV and mitigate forgetting. Empirical tests on four benchmarks confirm the effectiveness of our proposed training method, substantiating our theoretical framework concerning CF and model function dynamics. We plan to make our code publicly accessible in the near future.

21.2CLOct 12, 2023Code
Fine-grained Conversational Decoding via Isotropic and Proximal Search

Yuxuan Yao, Han Wu, Qiling Xu et al.

General-purpose text decoding approaches are usually adopted for dialogue response generation. Although the quality of the generated responses can be improved with dialogue-specific encoding methods, conversational decoding methods are still under-explored. Inspired by \citet{wu2023learning} that a good dialogue feature space should follow the rules of locality and isotropy, we present a fine-grained conversational decoding method, termed \textit{isotropic and proximal search (IPS)}. Our method is designed to generate the semantic-concentrated response, while still maintaining informativeness and discrimination against the context. Experiments show that our approach outperforms existing decoding strategies in the dialogue field across both automatic and human evaluation metrics. More in-depth analyses further confirm the effectiveness of our approach.

11.8IRFeb 10, 2025Code
RALLRec: Improving Retrieval Augmented Large Language Model Recommendation with Representation Learning

Jian Xu, Sichun Luo, Xiangyu Chen et al.

Large Language Models (LLMs) have been integrated into recommendation systems to enhance user behavior comprehension. The Retrieval Augmented Generation (RAG) technique is further incorporated into these systems to retrieve more relevant items and improve system performance. However, existing RAG methods rely primarily on textual semantics and often fail to incorporate the most relevant items, limiting the effectiveness of the systems. In this paper, we propose Representation learning for retrieval-Augmented Large Language model Recommendation (RALLRec). Specifically, we enhance textual semantics by prompting LLMs to generate more detailed item descriptions, followed by joint representation learning of textual and collaborative semantics, which are extracted by the LLM and recommendation models, respectively. Considering the potential time-varying characteristics of user interest, a simple yet effective reranking method is further introduced to capture the dynamics of user preference. We conducted extensive experiments on three real-world datasets, and the evaluation results validated the effectiveness of our method. Code is made public at https://github.com/JianXu95/RALLRec.

29.1CLJun 3, 2024Code
DuQuant: Distributing Outliers via Dual Transformation Makes Stronger Quantized LLMs

Haokun Lin, Haobo Xu, Yichen Wu et al.

Quantization of large language models (LLMs) faces significant challenges, particularly due to the presence of outlier activations that impede efficient low-bit representation. Traditional approaches predominantly address Normal Outliers, which are activations across all tokens with relatively large magnitudes. However, these methods struggle with smoothing Massive Outliers that display significantly larger values, which leads to significant performance degradation in low-bit quantization. In this paper, we introduce DuQuant, a novel approach that utilizes rotation and permutation transformations to more effectively mitigate both massive and normal outliers. First, DuQuant starts by constructing the rotation matrix, using specific outlier dimensions as prior knowledge, to redistribute outliers to adjacent channels by block-wise rotation. Second, We further employ a zigzag permutation to balance the distribution of outliers across blocks, thereby reducing block-wise variance. A subsequent rotation further smooths the activation landscape, enhancing model performance. DuQuant simplifies the quantization process and excels in managing outliers, outperforming the state-of-the-art baselines across various sizes and types of LLMs on multiple tasks, even with 4-bit weight-activation quantization. Our code is available at https://github.com/Hsu1023/DuQuant.

27.0CLMay 9, 2023Code
VCSUM: A Versatile Chinese Meeting Summarization Dataset

Han Wu, Mingjie Zhan, Haochen Tan et al.

Compared to news and chat summarization, the development of meeting summarization is hugely decelerated by the limited data. To this end, we introduce a versatile Chinese meeting summarization dataset, dubbed VCSum, consisting of 239 real-life meetings, with a total duration of over 230 hours. We claim our dataset is versatile because we provide the annotations of topic segmentation, headlines, segmentation summaries, overall meeting summaries, and salient sentences for each meeting transcript. As such, the dataset can adapt to various summarization tasks or methods, including segmentation-based summarization, multi-granularity summarization and retrieval-then-generate summarization. Our analysis confirms the effectiveness and robustness of VCSum. We also provide a set of benchmark models regarding different downstream summarization tasks on VCSum to facilitate further research. The dataset and code will be released at https://github.com/hahahawu/VCSum.

18.2LGSep 13, 2021Code
Byzantine-robust Federated Learning through Collaborative Malicious Gradient Filtering

Jian Xu, Shao-Lun Huang, Linqi Song et al.

Gradient-based training in federated learning is known to be vulnerable to faulty/malicious clients, which are often modeled as Byzantine clients. To this end, previous work either makes use of auxiliary data at parameter server to verify the received gradients (e.g., by computing validation error rate) or leverages statistic-based methods (e.g. median and Krum) to identify and remove malicious gradients from Byzantine clients. In this paper, we remark that auxiliary data may not always be available in practice and focus on the statistic-based approach. However, recent work on model poisoning attacks has shown that well-crafted attacks can circumvent most of median- and distance-based statistical defense methods, making malicious gradients indistinguishable from honest ones. To tackle this challenge, we show that the element-wise sign of gradient vector can provide valuable insight in detecting model poisoning attacks. Based on our theoretical analysis of the \textit{Little is Enough} attack, we propose a novel approach called \textit{SignGuard} to enable Byzantine-robust federated learning through collaborative malicious gradient filtering. More precisely, the received gradients are first processed to generate relevant magnitude, sign, and similarity statistics, which are then collaboratively utilized by multiple filters to eliminate malicious gradients before final aggregation. Finally, extensive experiments of image and text classification tasks are conducted under recently proposed attacks and defense strategies. The numerical results demonstrate the effectiveness and superiority of our proposed approach. The code is available at \textit{\url{https://github.com/JianXu95/SignGuard}}

19.4CVMar 12, 2024
MoPE-CLIP: Structured Pruning for Efficient Vision-Language Models with Module-wise Pruning Error Metric

Haokun Lin, Haoli Bai, Zhili Liu et al.

Vision-language pre-trained models have achieved impressive performance on various downstream tasks. However, their large model sizes hinder their utilization on platforms with limited computational resources. We find that directly using smaller pre-trained models and applying magnitude-based pruning on CLIP models leads to inflexibility and inferior performance. Recent efforts for VLP compression either adopt uni-modal compression metrics resulting in limited performance or involve costly mask-search processes with learnable masks. In this paper, we first propose the Module-wise Pruning Error (MoPE) metric, accurately assessing CLIP module importance by performance decline on cross-modal tasks. Using the MoPE metric, we introduce a unified pruning framework applicable to both pre-training and task-specific fine-tuning compression stages. For pre-training, MoPE-CLIP effectively leverages knowledge from the teacher model, significantly reducing pre-training costs while maintaining strong zero-shot capabilities. For fine-tuning, consecutive pruning from width to depth yields highly competitive task-specific models. Extensive experiments in two stages demonstrate the effectiveness of the MoPE metric, and MoPE-CLIP outperforms previous state-of-the-art VLP compression methods.

10.0CLMar 28, 2024Code
Learning From Correctness Without Prompting Makes LLM Efficient Reasoner

Yuxuan Yao, Han Wu, Zhijiang Guo et al.

Large language models (LLMs) have demonstrated outstanding performance across various tasks, yet they still exhibit limitations such as hallucination, unfaithful reasoning, and toxic content. One potential approach to mitigate these issues is learning from human or external feedback (e.g. tools). In this paper, we introduce an intrinsic self-correct reasoning framework for LLMs that eliminates the need for human feedback, external tools, and handcraft prompts. The proposed framework, based on a multi-step reasoning paradigm \textbf{Le}arning from \textbf{Co}rrectness (\textsc{LeCo}), improves reasoning performance without needing to learn from errors. This paradigm prioritizes learning from correct reasoning steps, and a unique method to measure confidence for each reasoning step based on generation logits. Experimental results across various multi-step reasoning tasks demonstrate the effectiveness of the framework in improving reasoning performance with reduced token consumption.

25.9AIMay 28, 2025
What Makes a Good Reasoning Chain? Uncovering Structural Patterns in Long Chain-of-Thought Reasoning

Gangwei Jiang, Yahui Liu, Zhaoyi Li et al.

Recent advances in reasoning with large language models (LLMs) have popularized Long Chain-of-Thought (LCoT), a strategy that encourages deliberate and step-by-step reasoning before producing a final answer. While LCoTs have enabled expert-level performance in complex tasks, how the internal structures of their reasoning chains drive, or even predict, the correctness of final answers remains a critical yet underexplored question. In this work, we present LCoT2Tree, an automated framework that converts sequential LCoTs into hierarchical tree structures and thus enables deeper structural analysis of LLM reasoning. Using graph neural networks (GNNs), we reveal that structural patterns extracted by LCoT2Tree, including exploration, backtracking, and verification, serve as stronger predictors of final performance across a wide range of tasks and models. Leveraging an explainability technique, we further identify critical thought patterns such as over-branching that account for failures. Beyond diagnostic insights, the structural patterns by LCoT2Tree support practical applications, including improving Best-of-N decoding effectiveness. Overall, our results underscore the critical role of internal structures of reasoning chains, positioning LCoT2Tree as a powerful tool for diagnosing, interpreting, and improving reasoning in LLMs.

14.6CLApr 5, 2024Code
Benchmarking and Improving Compositional Generalization of Multi-aspect Controllable Text Generation

Tianqi Zhong, Zhaoyi Li, Quan Wang et al.

Compositional generalization, representing the model's ability to generate text with new attribute combinations obtained by recombining single attributes from the training data, is a crucial property for multi-aspect controllable text generation (MCTG) methods. Nonetheless, a comprehensive compositional generalization evaluation benchmark of MCTG is still lacking. We propose CompMCTG, a benchmark encompassing diverse multi-aspect labeled datasets and a crafted three-dimensional evaluation protocol, to holistically evaluate the compositional generalization of MCTG approaches. We observe that existing MCTG works generally confront a noticeable performance drop in compositional testing. To mitigate this issue, we introduce Meta-MCTG, a training framework incorporating meta-learning, where we enable models to learn how to generalize by simulating compositional generalization scenarios in the training phase. We demonstrate the effectiveness of Meta-MCTG through achieving obvious improvement (by at most 3.64%) for compositional testing performance in 94.4% cases.

7.6CVOct 11, 2024
VOVTrack: Exploring the Potentiality in Videos for Open-Vocabulary Object Tracking

Zekun Qian, Ruize Han, Junhui Hou et al.

Open-vocabulary multi-object tracking (OVMOT) represents a critical new challenge involving the detection and tracking of diverse object categories in videos, encompassing both seen categories (base classes) and unseen categories (novel classes). This issue amalgamates the complexities of open-vocabulary object detection (OVD) and multi-object tracking (MOT). Existing approaches to OVMOT often merge OVD and MOT methodologies as separate modules, predominantly focusing on the problem through an image-centric lens. In this paper, we propose VOVTrack, a novel method that integrates object states relevant to MOT and video-centric training to address this challenge from a video object tracking standpoint. First, we consider the tracking-related state of the objects during tracking and propose a new prompt-guided attention mechanism for more accurate localization and classification (detection) of the time-varying objects. Subsequently, we leverage raw video data without annotations for training by formulating a self-supervised object similarity learning technique to facilitate temporal object association (tracking). Experimental results underscore that VOVTrack outperforms existing methods, establishing itself as a state-of-the-art solution for open-vocabulary tracking task.

6.7CLFeb 15, 2025
Beyond One-Size-Fits-All Pruning via Evolutionary Metric Search for Large Language Models

Shuqi Liu, Bowei He, Han Wu et al.

Post-training pruning has emerged as a crucial optimization technique as large language models (LLMs) continue to grow rapidly. However, the significant variations in weight distributions across different LLMs make fixed pruning strategies inadequate for multiple models. In this paper, we introduce \textbf{\textsc{OptiShear}}, an efficient evolutionary optimization framework for adaptive LLM pruning. Our framework features two key innovations: an effective search space built on our Meta pruning metric to handle diverse weight distributions, and a model-wise reconstruction error for rapid evaluation during search trials. We employ Non-dominated Sorting Genetic Algorithm III (NSGA-III) to optimize both pruning metrics and layerwise sparsity ratios. Through extensive evaluation on LLaMA-1/2/3 and Mistral models (7B-70B) across multiple benchmarks, we demonstrate that our adaptive pruning metrics consistently outperform existing methods. Additionally, our discovered layerwise sparsity ratios enhance the effectiveness of other pruning metrics. The framework exhibits strong cross-task and cross-model generalizability, providing a cost-effective solution for model compression.

7.8AIJul 10, 2025
DrugMCTS: a drug repurposing framework combining multi-agent, RAG and Monte Carlo Tree Search

Zerui Yang, Yuwei Wan, Siyu Yan et al.

Recent advances in large language models have demonstrated considerable potential in scientific domains such as drug repositioning. However, their effectiveness remains constrained when reasoning extends beyond the knowledge acquired during pretraining. Conventional approaches, such as fine-tuning or retrieval-augmented generation, face limitations in either imposing high computational overhead or failing to fully exploit structured scientific data. To overcome these challenges, we propose DrugMCTS, a novel framework that synergistically integrates RAG, multi-agent collaboration, and Monte Carlo Tree Search for drug repositioning. The framework employs five specialized agents tasked with retrieving and analyzing molecular and protein information, thereby enabling structured and iterative reasoning. Extensive experiments on the DrugBank and KIBA datasets demonstrate that DrugMCTS achieves substantially higher recall and robustness compared to both general-purpose LLMs and deep learning baselines. Our results highlight the importance of structured reasoning, agent-based collaboration, and feedback-driven search mechanisms in advancing LLM applications for drug repositioning.

4.6LGFeb 2, 2024
Improved Quantization Strategies for Managing Heavy-tailed Gradients in Distributed Learning

Guangfeng Yan, Tan Li, Yuanzhang Xiao et al.

Gradient compression has surfaced as a key technique to address the challenge of communication efficiency in distributed learning. In distributed deep learning, however, it is observed that gradient distributions are heavy-tailed, with outliers significantly influencing the design of compression strategies. Existing parameter quantization methods experience performance degradation when this heavy-tailed feature is ignored. In this paper, we introduce a novel compression scheme specifically engineered for heavy-tailed gradients, which effectively combines gradient truncation with quantization. This scheme is adeptly implemented within a communication-limited distributed Stochastic Gradient Descent (SGD) framework. We consider a general family of heavy-tail gradients that follow a power-law distribution, we aim to minimize the error resulting from quantization, thereby determining optimal values for two critical parameters: the truncation threshold and the quantization density. We provide a theoretical analysis on the convergence error bound under both uniform and non-uniform quantization scenarios. Comparative experiments with other benchmarks demonstrate the effectiveness of our proposed method in managing the heavy-tailed gradients in a distributed learning environment.

2.6LGFeb 2, 2024
Truncated Non-Uniform Quantization for Distributed SGD

Guangfeng Yan, Tan Li, Yuanzhang Xiao et al.

To address the communication bottleneck challenge in distributed learning, our work introduces a novel two-stage quantization strategy designed to enhance the communication efficiency of distributed Stochastic Gradient Descent (SGD). The proposed method initially employs truncation to mitigate the impact of long-tail noise, followed by a non-uniform quantization of the post-truncation gradients based on their statistical characteristics. We provide a comprehensive convergence analysis of the quantized distributed SGD, establishing theoretical guarantees for its performance. Furthermore, by minimizing the convergence error, we derive optimal closed-form solutions for the truncation threshold and non-uniform quantization levels under given communication constraints. Both theoretical insights and extensive experimental evaluations demonstrate that our proposed algorithm outperforms existing quantization schemes, striking a superior balance between communication efficiency and convergence performance.

4.9CLJan 1, 2025
DiffETM: Diffusion Process Enhanced Embedded Topic Model

Wei Shao, Mingyang Liu, Linqi Song

The embedded topic model (ETM) is a widely used approach that assumes the sampled document-topic distribution conforms to the logistic normal distribution for easier optimization. However, this assumption oversimplifies the real document-topic distribution, limiting the model's performance. In response, we propose a novel method that introduces the diffusion process into the sampling process of document-topic distribution to overcome this limitation and maintain an easy optimization process. We validate our method through extensive experiments on two mainstream datasets, proving its effectiveness in improving topic modeling performance.

13.0CLMay 20, 2025
Activation-Guided Consensus Merging for Large Language Models

Yuxuan Yao, Shuqi Liu, Zehua Liu et al.

Recent research has increasingly focused on reconciling the reasoning capabilities of System 2 with the efficiency of System 1. While existing training-based and prompt-based approaches face significant challenges in terms of efficiency and stability, model merging emerges as a promising strategy to integrate the diverse capabilities of different Large Language Models (LLMs) into a unified model. However, conventional model merging methods often assume uniform importance across layers, overlooking the functional heterogeneity inherent in neural components. To address this limitation, we propose \textbf{A}ctivation-Guided \textbf{C}onsensus \textbf{M}erging (\textbf{ACM}), a plug-and-play merging framework that determines layer-specific merging coefficients based on mutual information between activations of pre-trained and fine-tuned models. ACM effectively preserves task-specific capabilities without requiring gradient computations or additional training. Extensive experiments on Long-to-Short (L2S) and general merging tasks demonstrate that ACM consistently outperforms all baseline methods. For instance, in the case of Qwen-7B models, TIES-Merging equipped with ACM achieves a \textbf{55.3\%} reduction in response length while simultaneously improving reasoning accuracy by \textbf{1.3} points.

1.2DCNov 3, 2024
Flexible Coded Distributed Convolution Computing for Enhanced Straggler Resilience and Numerical Stability in Distributed CNNs

Shuo Tan, Rui Liu, Xuesong Han et al.

Deploying Convolutional Neural Networks (CNNs) on resource-constrained devices necessitates efficient management of computational resources, often via distributed environments susceptible to latency from straggler nodes. This paper introduces the Flexible Coded Distributed Convolution Computing (FCDCC) framework to enhance straggler resilience and numerical stability in distributed CNNs. We extend Coded Distributed Computing (CDC) with Circulant and Rotation Matrix Embedding (CRME) which was originally proposed for matrix multiplication to high-dimensional tensor convolution. For the proposed scheme, referred to as the Numerically Stable Coded Tensor Convolution (NSCTC) scheme, we also propose two new coded partitioning schemes: Adaptive-Padding Coded Partitioning (APCP) for the input tensor and Kernel-Channel Coded Partitioning (KCCP) for the filter tensor. These strategies enable linear decomposition of tensor convolutions and encoding them into CDC subtasks, combining model parallelism with coded redundancy for robust and efficient execution. Theoretical analysis identifies an optimal trade-off between communication and storage costs. Empirical results validate the framework's effectiveness in computational efficiency, straggler resilience, and scalability across various CNN architectures.

14.1CLJun 5, 2024
Bi-Chainer: Automated Large Language Models Reasoning with Bidirectional Chaining

Shuqi Liu, Bowei He, Linqi Song

Large Language Models (LLMs) have shown human-like reasoning abilities but still face challenges in solving complex logical problems. Existing unidirectional chaining methods, such as forward chaining and backward chaining, suffer from issues like low prediction accuracy and efficiency. To address these, we propose a bidirectional chaining method, Bi-Chainer, which dynamically switches to depth-first reasoning in the opposite reasoning direction when it encounters multiple branching options within the current direction. Thus, the intermediate reasoning results can be utilized as guidance to facilitate the reasoning process. We show that Bi-Chainer achieves sizable accuracy boots over unidirectional chaining frameworks on four challenging logical reasoning datasets. Moreover, Bi-Chainer enhances the accuracy of intermediate proof steps and reduces the average number of inference calls, resulting in more efficient and accurate reasoning.

1.9CLJun 3, 2024
Privacy in LLM-based Recommendation: Recent Advances and Future Directions

Sichun Luo, Wei Shao, Yuxuan Yao et al.

Nowadays, large language models (LLMs) have been integrated with conventional recommendation models to improve recommendation performance. However, while most of the existing works have focused on improving the model performance, the privacy issue has only received comparatively less attention. In this paper, we review recent advancements in privacy within LLM-based recommendation, categorizing them into privacy attacks and protection mechanisms. Additionally, we highlight several challenges and propose future directions for the community to address these critical problems.

19.1CLJan 26, 2024Code
PROXYQA: An Alternative Framework for Evaluating Long-Form Text Generation with Large Language Models

Haochen Tan, Zhijiang Guo, Zhan Shi et al.

Large Language Models (LLMs) have succeeded remarkably in understanding long-form contents. However, exploring their capability for generating long-form contents, such as reports and articles, has been relatively unexplored and inadequately assessed by existing benchmarks. The prevalent evaluation methods, which predominantly rely on crowdsourcing, are recognized for their labor-intensive nature and lack of efficiency, whereas automated metrics, such as the ROUGE score, demonstrate discordance with human judgment criteria. In this paper, we propose ProxyQA, an innovative framework dedicated to assessing long-text generation. ProxyQA comprises in-depth human-curated meta-questions spanning various domains, each accompanied by specific proxy-questions with pre-annotated answers. LLMs are tasked to generate extensive content in response to these meta-questions, by engaging an evaluator and incorporating the generated texts as contextual background, ProxyQA assesses the generated content's quality through the evaluator's accuracy in addressing the proxy-questions. We examine multiple LLMs, emphasizing ProxyQA's demanding nature as a high-quality assessment tool. Human evaluation demonstrates that the proxy-question method is notably self-consistent and aligns closely with human evaluative standards. The dataset and leaderboard is available at \url{https://proxy-qa.com}.

9.6CVJan 19, 2024Code
3D Shape Completion on Unseen Categories:A Weakly-supervised Approach

Lintai Wu, Junhui Hou, Linqi Song et al.

3D shapes captured by scanning devices are often incomplete due to occlusion. 3D shape completion methods have been explored to tackle this limitation. However, most of these methods are only trained and tested on a subset of categories, resulting in poor generalization to unseen categories. In this paper, we introduce a novel weakly-supervised framework to reconstruct the complete shapes from unseen categories. We first propose an end-to-end prior-assisted shape learning network that leverages data from the seen categories to infer a coarse shape. Specifically, we construct a prior bank consisting of representative shapes from the seen categories. Then, we design a multi-scale pattern correlation module for learning the complete shape of the input by analyzing the correlation between local patterns within the input and the priors at various scales. In addition, we propose a self-supervised shape refinement model to further refine the coarse shape. Considering the shape variability of 3D objects across categories, we construct a category-specific prior bank to facilitate shape refinement. Then, we devise a voxel-based partial matching loss and leverage the partial scans to drive the refinement process. Extensive experimental results show that our approach is superior to state-of-the-art methods by a large margin.

21.2CLMay 13, 2023
Reconstruct Before Summarize: An Efficient Two-Step Framework for Condensing and Summarizing Meeting Transcripts

Haochen Tan, Han Wu, Wei Shao et al.

Meetings typically involve multiple participants and lengthy conversations, resulting in redundant and trivial content. To overcome these challenges, we propose a two-step framework, Reconstruct before Summarize (RbS), for effective and efficient meeting summarization. RbS first leverages a self-supervised paradigm to annotate essential contents by reconstructing the meeting transcripts. Secondly, we propose a relative positional bucketing (RPB) algorithm to equip (conventional) summarization models to generate the summary. Despite the additional reconstruction process, our proposed RPB significantly compressed the input, leading to faster processing and reduced memory consumption compared to traditional summarization methods. We validate the effectiveness and efficiency of our method through extensive evaluations and analysis. On two meeting summarization datasets, AMI and ICSI, our approach outperforms previous state-of-the-art approaches without relying on large-scale pre-training or expert-grade annotating tools.

9.9LGNov 2, 2021
Privacy-Preserving Communication-Efficient Federated Multi-Armed Bandits

Tan Li, Linqi Song

Communication bottleneck and data privacy are two critical concerns in federated multi-armed bandit (MAB) problems, such as situations in decision-making and recommendations of connected vehicles via wireless. In this paper, we design the privacy-preserving communication-efficient algorithm in such problems and study the interactions among privacy, communication and learning performance in terms of the regret. To be specific, we design privacy-preserving learning algorithms and communication protocols and derive the learning regret when networked private agents are performing online bandit learning in a master-worker, a decentralized and a hybrid structure. Our bandit learning algorithms are based on epoch-wise sub-optimal arm eliminations at each agent and agents exchange learning knowledge with the server/each other at the end of each epoch. Furthermore, we adopt the differential privacy (DP) approach to protect the data privacy at each agent when exchanging information; and we curtail communication costs by making less frequent communications with fewer agents participation. By analyzing the regret of our proposed algorithmic framework in the master-worker, decentralized and hybrid structures, we theoretically show tradeoffs between regret and communication costs/privacy. Finally, we empirically show these trade-offs which are consistent with our theoretical analysis.

30.8CLSep 23, 2021Code
CSAGN: Conversational Structure Aware Graph Network for Conversational Semantic Role Labeling

Han Wu, Kun Xu, Linqi Song

Conversational semantic role labeling (CSRL) is believed to be a crucial step towards dialogue understanding. However, it remains a major challenge for existing CSRL parser to handle conversational structural information. In this paper, we present a simple and effective architecture for CSRL which aims to address this problem. Our model is based on a conversational structure-aware graph network which explicitly encodes the speaker dependent information. We also propose a multi-task learning method to further improve the model. Experimental results on benchmark datasets show that our model with our proposed training objectives significantly outperforms previous baselines.

1.6LGJul 30, 2021
DQ-SGD: Dynamic Quantization in SGD for Communication-Efficient Distributed Learning

Guangfeng Yan, Shao-Lun Huang, Tian Lan et al.

Gradient quantization is an emerging technique in reducing communication costs in distributed learning. Existing gradient quantization algorithms often rely on engineering heuristics or empirical observations, lacking a systematic approach to dynamically quantize gradients. This paper addresses this issue by proposing a novel dynamically quantized SGD (DQ-SGD) framework, enabling us to dynamically adjust the quantization scheme for each gradient descent step by exploring the trade-off between communication cost and convergence error. We derive an upper bound, tight in some cases, of the convergence error for a restricted family of quantization schemes and loss functions. We design our DQ-SGD algorithm via minimizing the communication cost under the convergence error constraints. Finally, through extensive experiments on large-scale natural language processing and computer vision tasks on AG-News, CIFAR-10, and CIFAR-100 datasets, we demonstrate that our quantization scheme achieves better tradeoffs between the communication cost and learning performance than other state-of-the-art gradient quantization methods.

31.6CLMay 28, 2021
Domain-Adaptive Pretraining Methods for Dialogue Understanding

Han Wu, Kun Xu, Linfeng Song et al.

Language models like BERT and SpanBERT pretrained on open-domain data have obtained impressive gains on various NLP tasks. In this paper, we probe the effectiveness of domain-adaptive pretraining objectives on downstream tasks. In particular, three objectives, including a novel objective focusing on modeling predicate-argument relations, are evaluated on two challenging dialogue understanding tasks. Experimental results demonstrate that domain-adaptive pretraining with proper objectives can significantly improve the performance of a strong baseline on these tasks, achieving the new state-of-the-art performances.

1.4CLApr 11, 2021
Conversational Semantic Role Labeling

Kun Xu, Han Wu, Linfeng Song et al.

Semantic role labeling (SRL) aims to extract the arguments for each predicate in an input sentence. Traditional SRL can fail to analyze dialogues because it only works on every single sentence, while ellipsis and anaphora frequently occur in dialogues. To address this problem, we propose the conversational SRL task, where an argument can be the dialogue participants, a phrase in the dialogue history or the current sentence. As the existing SRL datasets are in the sentence level, we manually annotate semantic roles for 3,000 chit-chat dialogues (27,198 sentences) to boost the research in this direction. Experiments show that while traditional SRL systems (even with the help of coreference resolution or rewriting) perform poorly for analyzing dialogues, modeling dialogue histories and participants greatly helps the performance, indicating that adapting SRL to conversations is very promising for universal dialogue understanding. Our initial study by applying CSRL to two mainstream conversational tasks, dialogue response generation and dialogue context rewriting, also confirms the usefulness of CSRL.

0.5CLJan 25, 2021
EGFI: Drug-Drug Interaction Extraction and Generation with Fusion of Enriched Entity and Sentence Information

Lei Huang, Jiecong Lin, Xiangtao Li et al.

The rapid growth in literature accumulates diverse and yet comprehensive biomedical knowledge hidden to be mined such as drug interactions. However, it is difficult to extract the heterogeneous knowledge to retrieve or even discover the latest and novel knowledge in an efficient manner. To address such a problem, we propose EGFI for extracting and consolidating drug interactions from large-scale medical literature text data. Specifically, EGFI consists of two parts: classification and generation. In the classification part, EGFI encompasses the language model BioBERT which has been comprehensively pre-trained on biomedical corpus. In particular, we propose the multi-head attention mechanism and pack BiGRU to fuse multiple semantic information for rigorous context modeling. In the generation part, EGFI utilizes another pre-trained language model BioGPT-2 where the generation sentences are selected based on filtering rules. We evaluated the classification part on "DDIs 2013" dataset and "DTIs" dataset, achieving the FI score of 0.842 and 0.720 respectively. Moreover, we applied the classification part to distinguish high-quality generated sentences and verified with the exiting growth truth to confirm the filtered sentences. The generated sentences that are not recorded in DrugBank and DDIs 2013 dataset also demonstrate the potential of EGFI to identify novel drug relationships.

2.3AIDec 3, 2020
Distributed Thompson Sampling

Jing Dong, Tan Li, Shaolei Ren et al.

We study a cooperative multi-agent multi-armed bandits with M agents and K arms. The goal of the agents is to minimized the cumulative regret. We adapt a traditional Thompson Sampling algoirthm under the distributed setting. However, with agent's ability to communicate, we note that communication may further reduce the upper bound of the regret for a distributed Thompson Sampling approach. To further improve the performance of distributed Thompson Sampling, we propose a distributed Elimination based Thompson Sampling algorithm that allow the agents to learn collaboratively. We analyse the algorithm under Bernoulli reward and derived a problem dependent upper bound on the cumulative regret.

31.1CLOct 3, 2020
Semantic Role Labeling Guided Multi-turn Dialogue ReWriter

Kun Xu, Haochen Tan, Linfeng Song et al.

For multi-turn dialogue rewriting, the capacity of effectively modeling the linguistic knowledge in dialog context and getting rid of the noises is essential to improve its performance. Existing attentive models attend to all words without prior focus, which results in inaccurate concentration on some dispensable words. In this paper, we propose to use semantic role labeling (SRL), which highlights the core semantic information of who did what to whom, to provide additional guidance for the rewriter model. Experiments show that this information significantly improves a RoBERTa-based model that already outperforms previous state-of-the-art systems.

16.8LGMay 14, 2020
Federated Recommendation System via Differential Privacy

Tan Li, Linqi Song, Christina Fragouli

In this paper, we are interested in what we term the federated private bandits framework, that combines differential privacy with multi-agent bandit learning. We explore how differential privacy based Upper Confidence Bound (UCB) methods can be applied to multi-agent environments, and in particular to federated learning environments both in `master-worker' and `fully decentralized' settings. We provide a theoretical analysis on the privacy and regret performance of the proposed methods and explore the tradeoffs between these two.

10.3LGJul 16, 2019Code
The Tradeoff Between Privacy and Accuracy in Anomaly Detection Using Federated XGBoost

Mengwei Yang, Linqi Song, Jie Xu et al.

Privacy has raised considerable concerns recently, especially with the advent of information explosion and numerous data mining techniques to explore the information inside large volumes of data. In this context, a new distributed learning paradigm termed federated learning becomes prominent recently to tackle the privacy issues in distributed learning, where only learning models will be transmitted from the distributed nodes to servers without revealing users' own data and hence protecting the privacy of users. In this paper, we propose a horizontal federated XGBoost algorithm to solve the federated anomaly detection problem, where the anomaly detection aims to identify abnormalities from extremely unbalanced datasets and can be considered as a special classification problem. Our proposed federated XGBoost algorithm incorporates data aggregation and sparse federated update processes to balance the tradeoff between privacy and learning performance. In particular, we introduce the virtual data sample by aggregating a group of users' data together at a single distributed node. We compute parameters based on these virtual data samples in the local nodes and aggregate the learning model in the central server. In the learning model upgrading process, we focus more on the wrongly classified data before in the virtual sample and hence to generate sparse learning model parameters. By carefully controlling the size of these groups of samples, we can achieve a tradeoff between privacy and learning performance. Our experimental results show the effectiveness of our proposed scheme by comparing with existing state-of-the-arts.

13.8DCJul 5, 2019
Data Encoding for Byzantine-Resilient Distributed Optimization

Deepesh Data, Linqi Song, Suhas Diggavi

We study distributed optimization in the presence of Byzantine adversaries, where both data and computation are distributed among $m$ worker machines, $t$ of which may be corrupt. The compromised nodes may collaboratively and arbitrarily deviate from their pre-specified programs, and a designated (master) node iteratively computes the model/parameter vector for generalized linear models. In this work, we primarily focus on two iterative algorithms: Proximal Gradient Descent (PGD) and Coordinate Descent (CD). Gradient descent (GD) is a special case of these algorithms. PGD is typically used in the data-parallel setting, where data is partitioned across different samples, whereas, CD is used in the model-parallelism setting, where data is partitioned across the parameter space. In this paper, we propose a method based on data encoding and error correction over real numbers to combat adversarial attacks. We can tolerate up to $t\leq \lfloor\frac{m-1}{2}\rfloor$ corrupt worker nodes, which is information-theoretically optimal. We give deterministic guarantees, and our method does not assume any probability distribution on the data. We develop a {\em sparse} encoding scheme which enables computationally efficient data encoding and decoding. We demonstrate a trade-off between the corruption threshold and the resource requirements (storage, computational, and communication complexity). As an example, for $t\leq\frac{m}{3}$, our scheme incurs only a {\em constant} overhead on these resources, over that required by the plain distributed PGD/CD algorithms which provide no adversarial protection. To the best of our knowledge, ours is the first paper that makes CD secure against adversarial attacks. Our encoding scheme extends efficiently to the data streaming model and for stochastic gradient descent (SGD). We also give experimental results to show the efficacy of our proposed schemes.

1.7IROct 15, 2018
Regret vs. Bandwidth Trade-off for Recommendation Systems

Linqi Song, Christina Fragouli, Devavrat Shah

We consider recommendation systems that need to operate under wireless bandwidth constraints, measured as number of broadcast transmissions, and demonstrate a (tight for some instances) tradeoff between regret and bandwidth for two scenarios: the case of multi-armed bandit with context, and the case where there is a latent structure in the message space that we can exploit to reduce the learning phase.

1.4LGJan 24, 2017
A Contextual Bandit Approach for Stream-Based Active Learning

Linqi Song, Jie Xu

Contextual bandit algorithms -- a class of multi-armed bandit algorithms that exploit the contextual information -- have been shown to be effective in solving sequential decision making problems under uncertainty. A common assumption adopted in the literature is that the realized (ground truth) reward by taking the selected action is observed by the learner at no cost, which, however, is not realistic in many practical scenarios. When observing the ground truth reward is costly, a key challenge for the learner is how to judiciously acquire the ground truth by assessing the benefits and costs in order to balance learning efficiency and learning cost. From the information theoretic perspective, a perhaps even more interesting question is how much efficiency might be lost due to this cost. In this paper, we design a novel contextual bandit-based learning algorithm and endow it with the active learning capability. The key feature of our algorithm is that in addition to sending a query to an annotator for the ground truth, prior information about the ground truth learned by the learner is sent together, thereby reducing the query cost. We prove that by carefully choosing the algorithm parameters, the learning regret of the proposed algorithm achieves the same order as that of conventional contextual bandit algorithms in cost-free scenarios, implying that, surprisingly, cost due to acquiring the ground truth does not increase the learning regret in the long-run. Our analysis shows that prior information about the ground truth plays a critical role in improving the system performance in scenarios where active learning is necessary.

1.0LGJul 11, 2016
Stream-based Online Active Learning in a Contextual Multi-Armed Bandit Framework

Linqi Song

We study the stream-based online active learning in a contextual multi-armed bandit framework. In this framework, the reward depends on both the arm and the context. In a stream-based active learning setting, obtaining the ground truth of the reward is costly, and the conventional contextual multi-armed bandit algorithm fails to achieve a sublinear regret due to this cost. Hence, the algorithm needs to determine whether or not to request the ground truth of the reward at current time slot. In our framework, we consider a stream-based active learning setting in which a query request for the ground truth is sent to the annotator, together with some prior information of the ground truth. Depending on the accuracy of the prior information, the query cost varies. Our algorithm mainly carries out two operations: the refinement of the context and arm spaces and the selection of actions. In our algorithm, the partitions of the context space and the arm space are maintained for a certain time slots, and then become finer as more information about the rewards accumulates. We use a strategic way to select the arms and to request the ground truth of the reward, aiming to maximize the total reward. We analytically show that the regret is sublinear and in the same order with that of the conventional contextual multi-armed bandit algorithms, where no query cost