Detecting Fake News with Weak Social Supervision
This addresses the data scarcity issue for researchers and practitioners in social media analysis, though it is incremental as it builds on existing weak supervision methods.
The paper tackles the problem of limited labeled data for supervised learning, particularly in fake news detection, by introducing weak social supervision that leverages social media characteristics to generate weak labels, showing it is effective in scenarios with scarce annotated examples.
Limited labeled data is becoming the largest bottleneck for supervised learning systems. This is especially the case for many real-world tasks where large scale annotated examples are either too expensive to acquire or unavailable due to privacy or data access constraints. Weak supervision has shown to be a good means to mitigate the scarcity of annotated data by leveraging weak labels or injecting constraints from heuristic rules and/or external knowledge sources. Social media has little labeled data but possesses unique characteristics that make it suitable for generating weak supervision, resulting in a new type of weak supervision, i.e., weak social supervision. In this article, we illustrate how various aspects of social media can be used to generate weak social supervision. Specifically, we use the recent research on fake news detection as the use case, where social engagements are abundant but annotated examples are scarce, to show that weak social supervision is effective when facing the little labeled data problem. This article opens the door for learning with weak social supervision for other emerging tasks.