IR CL LG MLDec 31, 2018

Unary and Binary Classification Approaches and their Implications for Authorship Verification

Oren Halvani, Christian Winter, Lukas Graner

arXiv:1901.00399v18.46 citations

Originality Synthesis-oriented

AI Analysis

This work addresses a foundational issue in digital text forensics and information retrieval by improving the characterization and evaluation of authorship verification methods, though it is incremental as it builds on existing literature.

The paper tackles the problem of authorship verification (AV) by clarifying misunderstandings between unary and binary classification approaches, proposing new criteria and properties to characterize them, and evaluating eleven existing AV methods and four generic unary algorithms on two self-compiled corpora.

Retrieving indexed documents, not by their topical content but their writing style opens the door for a number of applications in information retrieval (IR). One application is to retrieve textual content of a certain author X, where the queried IR system is provided beforehand with a set of reference texts of X. Authorship verification (AV), which is a research subject in the field of digital text forensics, is suitable for this purpose. The task of AV is to determine if two documents (i.e. an indexed and a reference document) have been written by the same author X. Even though AV represents a unary classification problem, a number of existing approaches consider it as a binary classification task. However, the underlying classification model of an AV method has a number of serious implications regarding its prerequisites, evaluability, and applicability. In our comprehensive literature review, we observed several misunderstandings regarding the differentiation of unary and binary AV approaches that require consideration. The objective of this paper is, therefore, to clarify these by proposing clear criteria and new properties that aim to improve the characterization of existing and future AV approaches. Given both, we investigate the applicability of eleven existing unary and binary AV methods as well as four generic unary classification algorithms on two self-compiled corpora. Furthermore, we highlight an important issue concerning the evaluation of AV methods based on fixed decision criterions, which has not been paid attention in previous AV studies.

View on arXiv PDF

Similar