Deep Bayes Factor Scoring for Authorship Verification
This work addresses authorship verification for fanfiction texts, which is an incremental improvement in a domain-specific task.
The authors tackled the PAN 2020 authorship verification challenge by developing a hierarchical fusion of deep metric learning and probabilistic Bayes factor scoring to handle cross-topic fanfiction texts, achieving competitive results on the benchmark.
The PAN 2020 authorship verification (AV) challenge focuses on a cross-topic/closed-set AV task over a collection of fanfiction texts. Fanfiction is a fan-written extension of a storyline in which a so-called fandom topic describes the principal subject of the document. The data provided in the PAN 2020 AV task is quite challenging because authors of texts across multiple/different fandom topics are included. In this work, we present a hierarchical fusion of two well-known approaches into a single end-to-end learning procedure: A deep metric learning framework at the bottom aims to learn a pseudo-metric that maps a document of variable length onto a fixed-sized feature vector. At the top, we incorporate a probabilistic layer to perform Bayes factor scoring in the learned metric space. We also provide text preprocessing strategies to deal with the cross-topic issue.