CLJul 22, 2024

SocialQuotes: Learning Contextual Roles of Social Media Quotes on the Web

arXiv:2407.16007v11 citationsh-index: 15
Originality Synthesis-oriented
AI Analysis

This enables more effective social media retrieval and richer scientific analyses for web researchers and developers, though it is incremental in applying existing methods to a new domain.

The authors tackled the problem of automatically annotating the contextual roles that social media quotes play when embedded in web pages, introducing a language modeling framework and releasing a dataset of 32 million social quotes with crowdsourced annotations. They demonstrated reasonable classification performance with modern LLMs and revealed cross-domain role distributions.

Web authors frequently embed social media to support and enrich their content, creating the potential to derive web-based, cross-platform social media representations that can enable more effective social media retrieval systems and richer scientific analyses. As step toward such capabilities, we introduce a novel language modeling framework that enables automatic annotation of roles that social media entities play in their embedded web context. Using related communication theory, we liken social media embeddings to quotes, formalize the page context as structured natural language signals, and identify a taxonomy of roles for quotes within the page context. We release SocialQuotes, a new data set built from the Common Crawl of over 32 million social quotes, 8.3k of them with crowdsourced quote annotations. Using SocialQuotes and the accompanying annotations, we provide a role classification case study, showing reasonable performance with modern-day LLMs, and exposing explainable aspects of our framework via page content ablations. We also classify a large batch of un-annotated quotes, revealing interesting cross-domain, cross-platform role distributions on the web.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes