CLFeb 10, 2019

Word embeddings for idiolect identification

arXiv:1902.03658v1
Originality Synthesis-oriented
AI Analysis

This addresses the problem of authorship attribution for social media users, but it appears incremental as it applies existing embedding methods to a specific domain.

The paper tackled the problem of learning distributed representations (embeddings) for social media users to reflect their writing style, exploring the performance of Neural Probabilistic Language models (e.g., word2vec) and matrix factorization (e.g., GloVe) for idiolect identification, but no concrete results or numbers are provided.

The term idiolect refers to the unique and distinctive use of language of an individual and it is the theoretical foundation of Authorship Attribution. In this paper we are focusing on learning distributed representations (embeddings) of social media users that reflect their writing style. These representations can be considered as stylistic fingerprints of the authors. We are exploring the performance of the two main flavours of distributed representations, namely embeddings produced by Neural Probabilistic Language models (such as word2vec) and matrix factorization (such as GloVe).

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes