CLDec 17, 2019

Open Set Authorship Attribution toward Demystifying Victorian Periodicals

Sarkhan Badirli, Mary Borgo Ton, Abdulmecit Gungor, Murat Dundar

arXiv:1912.08259v10.57 citations

Originality Synthesis-oriented

AI Analysis

This addresses the challenge of realistic authorship attribution in Victorian periodicals for historians and computational linguists, but it is incremental as it highlights limitations without proposing a new solution.

The paper tackled the problem of authorship attribution in historical texts by moving from a closed-set to an open-set configuration, revealing that while linear classifiers achieve near-perfect accuracy in closed settings, robust approaches are needed for large candidate pools.

Existing research in computational authorship attribution (AA) has primarily focused on attribution tasks with a limited number of authors in a closed-set configuration. This restricted set-up is far from being realistic in dealing with highly entangled real-world AA tasks that involve a large number of candidate authors for attribution during test time. In this paper, we study AA in historical texts using anew data set compiled from the Victorian literature. We investigate the predictive capacity of most common English words in distinguishing writings of most prominent Victorian novelists. We challenged the closed-set classification assumption and discussed the limitations of standard machine learning techniques in dealing with the open set AA task. Our experiments suggest that a linear classifier can achieve near perfect attribution accuracy under closed set assumption yet, the need for more robust approaches becomes evident once a large candidate pool has to be considered in the open-set classification setting.

View on arXiv PDF

Similar