CLJun 16, 2022

DIALOG-22 RuATD Generated Text Detection

arXiv:2206.08029v111 citationsh-index: 5
AI Analysis

This addresses the need to prevent abuse of text generation models, though it is incremental as it builds on existing pre-trained models.

The paper tackled the problem of detecting text generated by models versus human-written text, achieving first place in a binary classification task with 0.82995 accuracy and fourth place in a multiclass task with 0.62856 accuracy.

Text Generation Models (TGMs) succeed in creating text that matches human language style reasonably well. Detectors that can distinguish between TGM-generated text and human-written ones play an important role in preventing abuse of TGM. In this paper, we describe our pipeline for the two DIALOG-22 RuATD tasks: detecting generated text (binary task) and classification of which model was used to generate text (multiclass task). We achieved 1st place on the binary classification task with an accuracy score of 0.82995 on the private test set and 4th place on the multiclass classification task with an accuracy score of 0.62856 on the private test set. We proposed an ensemble method of different pre-trained models based on the attention mechanism.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes