Netizen-Style Commenting on Fashion Photos: Dataset and Diversity Measures
This work addresses the lack of engaging, context-rich comments in fashion photo captioning for social media users, though it is incremental as it builds on existing image captioning methods.
The paper tackles the problem of generating engaging, netizen-style comments for fashion photos by introducing a novel framework that includes a large-scale dataset (NetiLook with 300K posts and 5M comments), three diversity measures, and a method combining topic models with neural networks, resulting in improved accuracy and diversity in image captioning tasks.
Recently, deep neural network models have achieved promising results in image captioning task. Yet, "vanilla" sentences, only describing shallow appearances (e.g., types, colors), generated by current works are not satisfied netizen style resulting in lacking engagements, contexts, and user intentions. To tackle this problem, we propose Netizen Style Commenting (NSC), to automatically generate characteristic comments to a user-contributed fashion photo. We are devoted to modulating the comments in a vivid "netizen" style which reflects the culture in a designated social community and hopes to facilitate more engagement with users. In this work, we design a novel framework that consists of three major components: (1) We construct a large-scale clothing dataset named NetiLook, which contains 300K posts (photos) with 5M comments to discover netizen-style comments. (2) We propose three unique measures to estimate the diversity of comments. (3) We bring diversity by marrying topic models with neural networks to make up the insufficiency of conventional image captioning works. Experimenting over Flickr30k and our NetiLook datasets, we demonstrate our proposed approaches benefit fashion photo commenting and improve image captioning tasks both in accuracy and diversity.