CLFeb 10, 2019

Word embeddings for idiolect identification

Konstantinos Perifanos, Eirini Florou, Dionysis Goutsos

arXiv:1902.03658v1

Originality Synthesis-oriented

AI Analysis

This addresses the problem of authorship attribution for social media users, but it appears incremental as it applies existing embedding methods to a specific domain.

The paper tackled the problem of learning distributed representations (embeddings) for social media users to reflect their writing style, exploring the performance of Neural Probabilistic Language models (e.g., word2vec) and matrix factorization (e.g., GloVe) for idiolect identification, but no concrete results or numbers are provided.

The term idiolect refers to the unique and distinctive use of language of an individual and it is the theoretical foundation of Authorship Attribution. In this paper we are focusing on learning distributed representations (embeddings) of social media users that reflect their writing style. These representations can be considered as stylistic fingerprints of the authors. We are exploring the performance of the two main flavours of distributed representations, namely embeddings produced by Neural Probabilistic Language models (such as word2vec) and matrix factorization (such as GloVe).

View on arXiv PDF

Similar