Look Who's Talking: Inferring Speaker Attributes from Personal Longitudinal Dialog
This work addresses the problem of speaker attribute inference for researchers in computational linguistics and social network analysis, but it is incremental as it builds on existing methods with new features and data.
The study tackled the problem of inferring speaker attributes from personal longitudinal dialog by analyzing a large corpus of half a million instant messages from one individual's conversations with 104 partners, focusing on seven attributes like gender and relationship type. The result showed that using all features, including conversational aspects and graph-based features, led to gains of 9-14% over using message text only.
We examine a large dialog corpus obtained from the conversation history of a single individual with 104 conversation partners. The corpus consists of half a million instant messages, across several messaging platforms. We focus our analyses on seven speaker attributes, each of which partitions the set of speakers, namely: gender; relative age; family member; romantic partner; classmate; co-worker; and native to the same country. In addition to the content of the messages, we examine conversational aspects such as the time messages are sent, messaging frequency, psycholinguistic word categories, linguistic mirroring, and graph-based features reflecting how people in the corpus mention each other. We present two sets of experiments predicting each attribute using (1) short context windows; and (2) a larger set of messages. We find that using all features leads to gains of 9-14% over using message text only.