CLAILGSep 13, 2019

Say What I Want: Towards the Dark Side of Neural Dialogue Models

arXiv:1909.06044v317 citations
Originality Incremental advance
AI Analysis

This work addresses a security problem for chatbot services by revealing a weakness that could lead to misuse, though it is incremental in exploring a known vulnerability.

The paper tackles the security vulnerability of neural dialogue models, where they can be manipulated to generate targeted outputs, and demonstrates that a reinforcement learning-based approach can successfully craft such inputs in a considerable portion of cases.

Neural dialogue models have been widely adopted in various chatbot applications because of their good performance in simulating and generalizing human conversations. However, there exists a dark side of these models -- due to the vulnerability of neural networks, a neural dialogue model can be manipulated by users to say what they want, which brings in concerns about the security of practical chatbot services. In this work, we investigate whether we can craft inputs that lead a well-trained black-box neural dialogue model to generate targeted outputs. We formulate this as a reinforcement learning (RL) problem and train a Reverse Dialogue Generator which efficiently finds such inputs for targeted outputs. Experiments conducted on a representative neural dialogue model show that our proposed model is able to discover such desired inputs in a considerable portion of cases. Overall, our work reveals this weakness of neural dialogue models and may prompt further researches of developing corresponding solutions to avoid it.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes