Attention over Scene Graphs: Indoor Scene Representations Toward CSAI Classification
This addresses the problem of classifying sensitive indoor scenes for applications in law enforcement and content analysis, with incremental improvements in accuracy and privacy.
The paper tackles indoor scene classification, particularly for sensitive content like child sexual abuse imagery (CSAI), by proposing ASGRA, a framework that uses scene graphs and graph attention networks, achieving 81.27% balanced accuracy on Places8 and 74.27% on real-world CSAI data.
Indoor scene classification is a critical task in computer vision, with wide-ranging applications that go from robotics to sensitive content analysis, such as child sexual abuse imagery (CSAI) classification. The problem is particularly challenging due to the intricate relationships between objects and complex spatial layouts. In this work, we propose the Attention over Scene Graphs for Sensitive Content Analysis (ASGRA), a novel framework that operates on structured graph representations instead of raw pixels. By first converting images into Scene Graphs and then employing a Graph Attention Network for inference, ASGRA directly models the interactions between a scene's components. This approach offers two key benefits: (i) inherent explainability via object and relationship identification, and (ii) privacy preservation, enabling model training without direct access to sensitive images. On Places8, we achieve 81.27% balanced accuracy, surpassing image-based methods. Real-world CSAI evaluation with law enforcement yields 74.27% balanced accuracy. Our results establish structured scene representations as a robust paradigm for indoor scene classification and CSAI classification. Code is publicly available at https://github.com/tutuzeraa/ASGRA.