CLIRMar 6, 2014

Authorship detection of SMS messages using unigrams

arXiv:1403.1314v121 citations
Originality Synthesis-oriented
AI Analysis

This addresses the need for verifying SMS message authorship in legal and security contexts, though it appears incremental as it adapts existing n-gram methods to a specific domain.

The authors tackled the problem of authorship detection for SMS messages, which have unusual characteristics making conventional stylometric methods hard to apply, by proposing an n-gram method and testing it with varying data sizes, achieving usable accuracy in scenarios with limited testing data and many candidate authors.

SMS messaging is a popular media of communication. Because of its popularity and privacy, it could be used for many illegal purposes. Additionally, since they are part of the day to day life, SMSes can be used as evidence for many legal disputes. Since a cellular phone might be accessible to people close to the owner, it is important to establish the fact that the sender of the message is indeed the owner of the phone. For this purpose, the straight forward solutions seem to be the use of popular stylometric methods. However, in comparison with the data used for stylometry in the literature, SMSes have unusual characteristics making it hard or impossible to apply these methods in a conventional way. Our target is to come up with a method of authorship detection of SMS messages that could still give a usable accuracy. We argue that, considering the methods of author attribution, the best method that could be applied to SMS messages is an n-gram method. To prove our point, we checked two different methods of distribution comparison with varying number of training and testing data. We specifically try to compare how well our algorithms work under less amount of testing data and large number of candidate authors (which we believe to be the real world scenario) against controlled tests with less number of authors and selected SMSes with large number of words. To counter the lack of information in an SMS message, we propose the method of stacking together few SMSes.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes