CLDLIRFeb 14, 2018

Authorship Attribution Using the Chaos Game Representation

arXiv:1802.06007v13 citations
AI Analysis

This work addresses authorship attribution for text analysis, offering an incremental approach by applying an existing method from bioinformatics to a new domain.

The authors tackled authorship attribution by adapting the Chaos Game Representation to convert text chunks into images and applying machine learning classifiers, achieving competitive results on benchmark datasets like the Federalist Papers.

The Chaos Game Representation, a method for creating images from nucleotide sequences, is modified to make images from chunks of text documents. Machine learning methods are then applied to train classifiers based on authorship. Experiments are conducted on several benchmark data sets in English, including the widely used Federalist Papers, and one in Portuguese. Validation results for the trained classifiers are competitive with the best methods in prior literature. The methodology is also successfully applied for text categorization with encouraging results. One classifier method is moreover seen to hold promise for the task of digital fingerprinting.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes