Word embeddings for idiolect identification
This addresses the problem of authorship attribution for social media users, but it appears incremental as it applies existing embedding methods to a specific domain.
The paper tackled the problem of learning distributed representations (embeddings) for social media users to reflect their writing style, exploring the performance of Neural Probabilistic Language models (e.g., word2vec) and matrix factorization (e.g., GloVe) for idiolect identification, but no concrete results or numbers are provided.
The term idiolect refers to the unique and distinctive use of language of an individual and it is the theoretical foundation of Authorship Attribution. In this paper we are focusing on learning distributed representations (embeddings) of social media users that reflect their writing style. These representations can be considered as stylistic fingerprints of the authors. We are exploring the performance of the two main flavours of distributed representations, namely embeddings produced by Neural Probabilistic Language models (such as word2vec) and matrix factorization (such as GloVe).