Public Health Advocacy Dataset: A Dataset of Tobacco Usage Videos from Social Media
This work addresses the need for multi-modal data in public health research to inform regulatory policies and strategies, though it is incremental as it applies existing methods to new data.
The researchers tackled the problem of analyzing tobacco-related content on social media by creating the Public Health Advocacy Dataset (PHAD), which includes 5,730 videos and 4.3 million frames, and they used a two-stage classification approach to achieve superior performance in categorizing tobacco products and usage scenarios.
The Public Health Advocacy Dataset (PHAD) is a comprehensive collection of 5,730 videos related to tobacco products sourced from social media platforms like TikTok and YouTube. This dataset encompasses 4.3 million frames and includes detailed metadata such as user engagement metrics, video descriptions, and search keywords. This is the first dataset with these features providing a valuable resource for analyzing tobacco-related content and its impact. Our research employs a two-stage classification approach, incorporating a Vision-Language (VL) Encoder, demonstrating superior performance in accurately categorizing various types of tobacco products and usage scenarios. The analysis reveals significant user engagement trends, particularly with vaping and e-cigarette content, highlighting areas for targeted public health interventions. The PHAD addresses the need for multi-modal data in public health research, offering insights that can inform regulatory policies and public health strategies. This dataset is a crucial step towards understanding and mitigating the impact of tobacco usage, ensuring that public health efforts are more inclusive and effective.