CLLGAug 23, 2020

Deep Bayes Factor Scoring for Authorship Verification

arXiv:2008.10105v127 citations
Originality Incremental advance
AI Analysis

This work addresses authorship verification for fanfiction texts, which is an incremental improvement in a domain-specific task.

The authors tackled the PAN 2020 authorship verification challenge by developing a hierarchical fusion of deep metric learning and probabilistic Bayes factor scoring to handle cross-topic fanfiction texts, achieving competitive results on the benchmark.

The PAN 2020 authorship verification (AV) challenge focuses on a cross-topic/closed-set AV task over a collection of fanfiction texts. Fanfiction is a fan-written extension of a storyline in which a so-called fandom topic describes the principal subject of the document. The data provided in the PAN 2020 AV task is quite challenging because authors of texts across multiple/different fandom topics are included. In this work, we present a hierarchical fusion of two well-known approaches into a single end-to-end learning procedure: A deep metric learning framework at the bottom aims to learn a pseudo-metric that maps a document of variable length onto a fixed-sized feature vector. At the top, we incorporate a probabilistic layer to perform Bayes factor scoring in the learned metric space. We also provide text preprocessing strategies to deal with the cross-topic issue.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes