YouTube AV 50K: An Annotated Corpus for Comments in Autonomous Vehicles
This provides a resource for opinion mining and sentiment analysis in the autonomous vehicle domain, but it is incremental as it applies existing methods to new data.
The authors introduced YouTube AV 50K, a freely available dataset of over 50,000 YouTube comments and metadata from autonomous vehicle-related videos, and demonstrated its utility through a case study analyzing public attitudes toward self-driving cars and reactions to the first self-driving car fatality.
With one billion monthly viewers, and millions of users discussing and sharing opinions, comments below YouTube videos are rich sources of data for opinion mining and sentiment analysis. We introduce the YouTube AV 50K dataset, a freely-available collections of more than 50,000 YouTube comments and metadata below autonomous vehicle (AV)-related videos. We describe its creation process, its content and data format, and discuss its possible usages. Especially, we do a case study of the first self-driving car fatality to evaluate the dataset, and show how we can use this dataset to better understand public attitudes toward self-driving cars and public reactions to the accident. Future developments of the dataset are also discussed.