CLFeb 11, 2020

Training with Streaming Annotation

Tongtao Zhang, Heng Ji, Shih-Fu Chang, Marjorie Freedman

arXiv:2002.04165v20.2

Originality Incremental advance

AI Analysis

This addresses a practical problem for machine learning practitioners dealing with streaming annotation scenarios, particularly in event extraction, though it appears incremental as it builds on existing transformer methods.

The paper tackles the problem of training models with streaming data where early annotations have lower quality than later ones, proposing a framework that uses a pre-trained transformer to preserve salient information from early batches while focusing on current high-quality annotations. In event extraction experiments, it achieves 3.6-14.9% absolute F-score gains and reduces time by 19.1% compared to conventional methods.

In this paper, we address a practical scenario where training data is released in a sequence of small-scale batches and annotation in earlier phases has lower quality than the later counterparts. To tackle the situation, we utilize a pre-trained transformer network to preserve and integrate the most salient document information from the earlier batches while focusing on the annotation (presumably with higher quality) from the current batch. Using event extraction as a case study, we demonstrate in the experiments that our proposed framework can perform better than conventional approaches (the improvement ranges from 3.6 to 14.9% absolute F-score gain), especially when there is more noise in the early annotation; and our approach spares 19.1% time with regard to the best conventional method.

View on arXiv PDF

Similar