Bridging Social Media and Search Engines: Dredge Words and the Detection of Unreliable Domains
This addresses content moderation for social media and search platforms by improving detection of unreliable domains, though it is incremental as it builds on existing graph neural network methods with new data integration.
The paper tackled the problem of identifying unreliable websites by developing a system that integrates webgraph and social media contexts, introducing 'dredge words' (terms where unreliable domains rank highly in search engines) and achieving state-of-the-art results in website credibility classification and top-k identification of unreliable domains.
Proactive content moderation requires platforms to rapidly and continuously evaluate the credibility of websites. Leveraging the direct and indirect paths users follow to unreliable websites, we develop a website credibility classification and discovery system that integrates both webgraph and large-scale social media contexts. We additionally introduce the concept of dredge words, terms or phrases for which unreliable domains rank highly on search engines, and provide the first exploration of their usage on social media. Our graph neural networks that combine webgraph and social media contexts generate to state-of-the-art results in website credibility classification and significantly improves the top-k identification of unreliable domains. Additionally, we release a novel dataset of dredge words, highlighting their strong connections to both social media and online commerce platforms.