CVOct 10, 2022Code
What the DAAM: Interpreting Stable Diffusion Using Cross AttentionRaphael Tang, Linqing Liu, Akshat Pandey et al.
Large-scale diffusion neural networks represent a substantial milestone in text-to-image generation, but they remain poorly understood, lacking interpretability analyses. In this paper, we perform a text-image attribution analysis on Stable Diffusion, a recently open-sourced model. To produce pixel-level attribution maps, we upscale and aggregate cross-attention word-pixel scores in the denoising subnetwork, naming our method DAAM. We evaluate its correctness by testing its semantic segmentation ability on nouns, as well as its generalized attribution quality on all parts of speech, rated by humans. We then apply DAAM to study the role of syntax in the pixel space, characterizing head--dependent heat map interaction patterns for ten common dependency relations. Finally, we study several semantic phenomena using DAAM, with a focus on feature entanglement, where we find that cohyponyms worsen generation quality and descriptive adjectives attend too broadly. To our knowledge, we are the first to interpret large diffusion models from a visuolinguistic perspective, which enables future lines of research. Our code is at https://github.com/castorini/daam.
CLOct 13, 2022
Query Expansion Using Contextual Clue Sampling with Language ModelsLinqing Liu, Minghan Li, Jimmy Lin et al.
Query expansion is an effective approach for mitigating vocabulary mismatch between queries and documents in information retrieval. One recent line of research uses language models to generate query-related contexts for expansion. Along this line, we argue that expansion terms from these contexts should balance two key aspects: diversity and relevance. The obvious way to increase diversity is to sample multiple contexts from the language model. However, this comes at the cost of relevance, because there is a well-known tendency of models to hallucinate incorrect or irrelevant contexts. To balance these two considerations, we propose a combination of an effective filtering strategy and fusion of the retrieved documents based on the generation probability of each context. Our lexical matching based approach achieves a similar top-5/top-20 retrieval accuracy and higher top-100 accuracy compared with the well-established dense retrieval model DPR, while reducing the index size by more than 96%. For end-to-end QA, the reader model also benefits from our method and achieves the highest Exact-Match score against several competitive baselines.
LGFeb 1, 2022
When Do Flat Minima Optimizers Work?Jean Kaddour, Linqing Liu, Ricardo Silva et al.
Recently, flat-minima optimizers, which seek to find parameters in low-loss neighborhoods, have been shown to improve a neural network's generalization performance over stochastic and adaptive gradient-based optimizers. Two methods have received significant attention due to their scalability: 1. Stochastic Weight Averaging (SWA), and 2. Sharpness-Aware Minimization (SAM). However, there has been limited investigation into their properties and no systematic benchmarking of them across different domains. We fill this gap here by comparing the loss surfaces of the models trained with each method and through broad benchmarking across computer vision, natural language processing, and graph representation learning tasks. We discover several surprising findings from these results, which we hope will help researchers further improve deep learning optimizers, and practitioners identify the right optimizer for their problem.
CLSep 2, 2021
Challenges in Generalization in Open Domain Question AnsweringLinqing Liu, Patrick Lewis, Sebastian Riedel et al.
Recent work on Open Domain Question Answering has shown that there is a large discrepancy in model performance between novel test questions and those that largely overlap with training questions. However, it is unclear which aspects of novel questions make them challenging. Drawing upon studies on systematic generalization, we introduce and annotate questions according to three categories that measure different levels and kinds of generalization: training set overlap, compositional generalization (comp-gen), and novel-entity generalization (novel-entity). When evaluating six popular parametric and non-parametric models, we find that for the established Natural Questions and TriviaQA datasets, even the strongest model performance for comp-gen/novel-entity is 13.1/5.4% and 9.6/1.5% lower compared to that for the full test set -- indicating the challenge posed by these types of questions. Furthermore, we show that whilst non-parametric models can handle questions containing novel entities relatively well, they struggle with those requiring compositional generalization. Lastly, we find that key question difficulty factors are: cascading errors from the retrieval component, frequency of question pattern, and frequency of the entity.
CLMay 28, 2021
Controllable Abstractive Dialogue Summarization with Sketch SupervisionChien-Sheng Wu, Linqing Liu, Wenhao Liu et al.
In this paper, we aim to improve abstractive dialogue summarization quality and, at the same time, enable granularity control. Our model has two primary components and stages: 1) a two-stage generation strategy that generates a preliminary summary sketch serving as the basis for the final summary. This summary sketch provides a weakly supervised signal in the form of pseudo-labeled interrogative pronoun categories and key phrases extracted using a constituency parser. 2) A simple strategy to control the granularity of the final summary, in that our model can automatically determine or control the number of generated summary sentences for a given dialogue by predicting and highlighting different text spans from the source text. Our model achieves state-of-the-art performance on the largest dialogue summarization corpus SAMSum, with as high as 50.79 in ROUGE-L score. In addition, we conduct a case study and show competitive human evaluation results and controllability to human-annotated summaries.
CLFeb 13, 2021
PAQ: 65 Million Probably-Asked Questions and What You Can Do With ThemPatrick Lewis, Yuxiang Wu, Linqing Liu et al.
Open-domain Question Answering models which directly leverage question-answer (QA) pairs, such as closed-book QA (CBQA) models and QA-pair retrievers, show promise in terms of speed and memory compared to conventional models which retrieve and read from text corpora. QA-pair retrievers also offer interpretable answers, a high degree of control, and are trivial to update at test time with new knowledge. However, these models lack the accuracy of retrieve-and-read systems, as substantially less knowledge is covered by the available QA-pairs relative to text corpora like Wikipedia. To facilitate improved QA-pair models, we introduce Probably Asked Questions (PAQ), a very large resource of 65M automatically-generated QA-pairs. We introduce a new QA-pair retriever, RePAQ, to complement PAQ. We find that PAQ preempts and caches test questions, enabling RePAQ to match the accuracy of recent retrieve-and-read models, whilst being significantly faster. Using PAQ, we train CBQA models which outperform comparable baselines by 5%, but trail RePAQ by over 15%, indicating the effectiveness of explicit retrieval. RePAQ can be configured for size (under 500MB) or speed (over 1K questions per second) whilst retaining high accuracy. Lastly, we demonstrate RePAQ's strength at selective QA, abstaining from answering when it is likely to be incorrect. This enables RePAQ to ``back-off" to a more expensive state-of-the-art model, leading to a combined system which is both more accurate and 2x faster than the state-of-the-art model alone.
CLJan 1, 2021
NeurIPS 2020 EfficientQA Competition: Systems, Analyses and Lessons LearnedSewon Min, Jordan Boyd-Graber, Chris Alberti et al.
We review the EfficientQA competition from NeurIPS 2020. The competition focused on open-domain question answering (QA), where systems take natural language questions as input and return natural language answers. The aim of the competition was to build systems that can predict correct answers while also satisfying strict on-disk memory budgets. These memory budgets were designed to encourage contestants to explore the trade-off between storing retrieval corpora or the parameters of learned models. In this report, we describe the motivation and organization of the competition, review the best submissions, and analyze system predictions to inform a discussion of evaluation for open-domain QA.
CLNov 9, 2019
MKD: a Multi-Task Knowledge Distillation Approach for Pretrained Language ModelsLinqing Liu, Huan Wang, Jimmy Lin et al.
Pretrained language models have led to significant performance gains in many NLP tasks. However, the intensive computing resources to train such models remain an issue. Knowledge distillation alleviates this problem by learning a light-weight student model. So far the distillation approaches are all task-specific. In this paper, we explore knowledge distillation under the multi-task learning setting. The student is jointly distilled across different tasks. It acquires more general representation capacity through multi-tasking distillation and can be further fine-tuned to improve the model in the target domain. Unlike other BERT distillation methods which specifically designed for Transformer-based architectures, we provide a general learning framework. Our approach is model agnostic and can be easily applied on different future teacher model architectures. We evaluate our approach on a Transformer-based and LSTM based student model. Compared to a strong, similarly LSTM-based approach, we achieve better quality under the same computational constraints. Compared to the present state of the art, we reach comparable results with much faster inference speed.
CLMar 28, 2019
Distilling Task-Specific Knowledge from BERT into Simple Neural NetworksRaphael Tang, Yao Lu, Linqing Liu et al.
In the natural language processing literature, neural networks are becoming increasingly deeper and complex. The recent poster child of this trend is the deep language representation model, which includes BERT, ELMo, and GPT. These developments have led to the conviction that previous-generation, shallower neural networks for language understanding are obsolete. In this paper, however, we demonstrate that rudimentary, lightweight neural networks can still be made competitive without architecture changes, external training data, or additional input features. We propose to distill knowledge from BERT, a state-of-the-art language representation model, into a single-layer BiLSTM, as well as its siamese counterpart for sentence-pair tasks. Across multiple datasets in paraphrasing, natural language inference, and sentiment classification, we achieve comparable results with ELMo, while using roughly 100 times fewer parameters and 15 times less inference time.
CLNov 26, 2017
Generative Adversarial Network for Abstractive Text SummarizationLinqing Liu, Yao Lu, Min Yang et al.
In this paper, we propose an adversarial process for abstractive text summarization, in which we simultaneously train a generative model G and a discriminative model D. In particular, we build the generator G as an agent of reinforcement learning, which takes the raw text as input and predicts the abstractive summarization. We also build a discriminator which attempts to distinguish the generated summary from the ground truth summary. Extensive experiments demonstrate that our model achieves competitive ROUGE scores with the state-of-the-art methods on CNN/Daily Mail dataset. Qualitatively, we show that our model is able to generate more abstractive, readable and diverse summaries.
CLApr 28, 2016
Detecting "Smart" Spammers On Social Network: A Topic Model ApproachLinqing Liu, Yao Lu, Ye Luo et al.
Spammer detection on social network is a challenging problem. The rigid anti-spam rules have resulted in emergence of "smart" spammers. They resemble legitimate users who are difficult to identify. In this paper, we present a novel spammer classification approach based on Latent Dirichlet Allocation(LDA), a topic model. Our approach extracts both the local and the global information of topic distribution patterns, which capture the essence of spamming. Tested on one benchmark dataset and one self-collected dataset, our proposed method outperforms other state-of-the-art methods in terms of averaged F1-score.