Do not let the history haunt you -- Mitigating Compounding Errors in Conversational Question Answering
This addresses a practical limitation in CoQA for real-world deployment, though it is incremental as it builds on existing CoQA frameworks.
The paper tackles the problem of compounding errors in Conversational Question Answering (CoQA) systems, which occur when models rely on their own predicted answers instead of ground-truth during testing, leading to significant performance drops. It proposes a sampling strategy during training to mitigate this issue and analyzes the severity based on question type, conversation length, and domain.
The Conversational Question Answering (CoQA) task involves answering a sequence of inter-related conversational questions about a contextual paragraph. Although existing approaches employ human-written ground-truth answers for answering conversational questions at test time, in a realistic scenario, the CoQA model will not have any access to ground-truth answers for the previous questions, compelling the model to rely upon its own previously predicted answers for answering the subsequent questions. In this paper, we find that compounding errors occur when using previously predicted answers at test time, significantly lowering the performance of CoQA systems. To solve this problem, we propose a sampling strategy that dynamically selects between target answers and model predictions during training, thereby closely simulating the situation at test time. Further, we analyse the severity of this phenomena as a function of the question type, conversation length and domain type.