Linguistic Characteristics of Censorable Language on SinaWeibo
This addresses censorship detection for social media platforms, but it is incremental as it applies existing methods to a specific dataset.
The paper tackled the problem of predicting censorship on SinaWeibo by analyzing linguistic characteristics, finding that readability is the strongest indicator of censored content in their corpus.
This paper investigates censorship from a linguistic perspective. We collect a corpus of censored and uncensored posts on a number of topics, build a classifier that predicts censorship decisions independent of discussion topics. Our investigation reveals that the strongest linguistic indicator of censored content of our corpus is its readability.