CLJun 10, 2021

DT-grams: Structured Dependency Grammar Stylometry for Cross-Language Authorship Attribution

arXiv:2106.05677v18 citations
Originality Incremental advance
AI Analysis

This addresses authorship attribution for bilingual authors without translation, providing a baseline for an undocumented task, though it is incremental as it builds on existing dependency graph methods.

The paper tackles cross-language authorship attribution by introducing DT-grams, a language-independent feature based on dependency graphs and universal POS tags, achieving an average macro-averaged F1 score 0.081 higher than previous methods across five language pairs.

Cross-language authorship attribution problems rely on either translation to enable the use of single-language features, or language-independent feature extraction methods. Until recently, the lack of datasets for this problem hindered the development of the latter, and single-language solutions were performed on machine-translated corpora. In this paper, we present a novel language-independent feature for authorship analysis based on dependency graphs and universal part of speech tags, called DT-grams (dependency tree grams), which are constructed by selecting specific sub-parts of the dependency graph of sentences. We evaluate DT-grams by performing cross-language authorship attribution on untranslated datasets of bilingual authors, showing that, on average, they achieve a macro-averaged F1 score of 0.081 higher than previous methods across five different language pairs. Additionally, by providing results for a diverse set of features for comparison, we provide a baseline on the previously undocumented task of untranslated cross-language authorship attribution.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes