What Your Username Says About You
This is an incremental improvement for social media or online platform analysis, offering better demographic inference from usernames.
This work tackles the problem of inferring user demographics (gender and language) from usernames alone, using unsupervised morphology induction to decompose usernames into sub-units. Experimental results show the proposed morphological features are more effective than a character n-gram baseline.
Usernames are ubiquitous on the Internet, and they are often suggestive of user demographics. This work looks at the degree to which gender and language can be inferred from a username alone by making use of unsupervised morphology induction to decompose usernames into sub-units. Experimental results on the two tasks demonstrate the effectiveness of the proposed morphological features compared to a character n-gram baseline.