We Don't Speak the Same Language: Interpreting Polarization through Machine Translation
It offers a novel interpretable framework for analyzing polarization in social media data, though it is incremental in applying existing translation methods to a new domain.
The paper tackles the problem of interpreting political polarization by proposing that sub-communities speak different 'languages', using machine translation on 86.6 million YouTube comments to reveal word-level differences like 'black lives matter' versus 'all lives matter'.
Polarization among US political parties, media and elites is a widely studied topic. Prominent lines of prior research across multiple disciplines have observed and analyzed growing polarization in social media. In this paper, we present a new methodology that offers a fresh perspective on interpreting polarization through the lens of machine translation. With a novel proposition that two sub-communities are speaking in two different \emph{languages}, we demonstrate that modern machine translation methods can provide a simple yet powerful and interpretable framework to understand the differences between two (or more) large-scale social media discussion data sets at the granularity of words. Via a substantial corpus of 86.6 million comments by 6.5 million users on over 200,000 news videos hosted by YouTube channels of four prominent US news networks, we demonstrate that simple word-level and phrase-level translation pairs can reveal deep insights into the current political divide -- what is \emph{black lives matter} to one can be \emph{all lives matter} to the other.