Mind the Quote: Enabling Quotation-Aware Dialogue in LLMs via Plug-and-Play Modules
This addresses the problem of improving human-AI conversation for users by enabling models to locate and exploit quoted text, though it is incremental as it builds on existing LLM architectures.
The paper tackles the problem of enabling large language models to handle quotation-aware dialogue by formalizing it as span-conditioned generation and introducing a data pipeline for synthesizing dialogues and a benchmark. The result is QuAda, a lightweight method that updates less than 2.8% of backbone weights and generalizes to unseen topics across five scenarios.
Human-AI conversation frequently relies on quoting earlier text-"check it with the formula I just highlighted"-yet today's large language models (LLMs) lack an explicit mechanism for locating and exploiting such spans. We formalise the challenge as span-conditioned generation, decomposing each turn into the dialogue history, a set of token-offset quotation spans, and an intent utterance. Building on this abstraction, we introduce a quotation-centric data pipeline that automatically synthesises task-specific dialogues, verifies answer correctness through multi-stage consistency checks, and yields both a heterogeneous training corpus and the first benchmark covering five representative scenarios. To meet the benchmark's zero-overhead and parameter-efficiency requirements, we propose QuAda, a lightweight training-based method that attaches two bottleneck projections to every attention head, dynamically amplifying or suppressing attention to quoted spans at inference time while leaving the prompt unchanged and updating < 2.8% of backbone weights. Experiments across models show that QuAda is suitable for all scenarios and generalises to unseen topics, offering an effective, plug-and-play solution for quotation-aware dialogue.