Reference-based Weak Supervision for Answer Sentence Selection using Web Data
This addresses the data annotation bottleneck for AS2 modeling, offering an incremental improvement by automating data collection to enhance existing methods.
The paper tackles the problem of requiring hand-labeled data for answer sentence selection (AS2) by introducing Reference-based Weak Supervision (RWS), a fully automatic pipeline that harvests weakly-supervised answers from web data, achieving state-of-the-art results with 90.1% P@1 and 92.9% MAP on WikiQA.
Answer sentence selection (AS2) modeling requires annotated data, i.e., hand-labeled question-answer pairs. We present a strategy to collect weakly supervised answers for a question based on its reference to improve AS2 modeling. Specifically, we introduce Reference-based Weak Supervision (RWS), a fully automatic large-scale data pipeline that harvests high-quality weakly-supervised answers from abundant Web data requiring only a question-reference pair as input. We study the efficacy and robustness of RWS in the setting of TANDA, a recent state-of-the-art fine-tuning approach specialized for AS2. Our experiments indicate that the produced data consistently bolsters TANDA. We achieve the state of the art in terms of P@1, 90.1%, and MAP, 92.9%, on WikiQA.