Similarity Learning for Authorship Verification in Social Media
This work addresses the problem of forensic authorship verification for social media, which is important for security and legal applications, but it appears incremental as it builds on existing similarity learning methods.
The paper tackled authorship verification for social media messages, which is challenging due to short texts and diverse genres, and proposed a new neural network topology for similarity learning that significantly improved performance on this task.
Authorship verification tries to answer the question if two documents with unknown authors were written by the same author or not. A range of successful technical approaches has been proposed for this task, many of which are based on traditional linguistic features such as n-grams. These algorithms achieve good results for certain types of written documents like books and novels. Forensic authorship verification for social media, however, is a much more challenging task since messages tend to be relatively short, with a large variety of different genres and topics. At this point, traditional methods based on features like n-grams have had limited success. In this work, we propose a new neural network topology for similarity learning that significantly improves the performance on the author verification task with such challenging data sets.