MM IRNov 10, 2019

A Multimodal CNN-based Tool to Censure Inappropriate Video Scenes

Pedro V. A. de Freitas, Paulo R. C. Mendes, Gabriel N. P. dos Santos, Antonio José G. Busson, Álan Livio Guedes, Sérgio Colcher, Ruy Luiz Milidiú

arXiv:1911.03974v13.36 citations

Originality Incremental advance

AI Analysis

This work addresses the challenge of content moderation for video-sharing platforms by enabling precise identification and censoring of inappropriate scenes, though it is incremental as it builds on existing multimodal and CNN approaches.

The authors tackled the problem of detecting inappropriate content in videos by developing a multimodal CNN-based architecture that uses audio and image features, achieving F1-scores of 98.95% for appropriate and 98.94% for inappropriate classes in classification tasks.

Due to the extensive use of video-sharing platforms and services for their storage, the amount of such media on the internet has become massive. This volume of data makes it difficult to control the kind of content that may be present in such video files. One of the main concerns regarding the video content is if it has an inappropriate subject matter, such as nudity, violence, or other potentially disturbing content. More than telling if a video is either appropriate or inappropriate, it is also important to identify which parts of it contain such content, for preserving parts that would be discarded in a simple broad analysis. In this work, we present a multimodal~(using audio and image features) architecture based on Convolutional Neural Networks (CNNs) for detecting inappropriate scenes in video files. In the task of classifying video files, our model achieved 98.95\% and 98.94\% of F1-score for the appropriate and inappropriate classes, respectively. We also present a censoring tool that automatically censors inappropriate segments of a video file.

View on arXiv PDF

Similar