CVJan 31, 2024

Semantic-Syntactic Discrepancy in Images (SSDI): Learning Meaning and Order of Features from Natural Images

arXiv:2401.17515v21 citationsh-index: 9Has CodeTrans. Mach. Learn. Res.
Originality Incremental advance
AI Analysis

This addresses a vulnerability in computer vision for image classification, though it appears incremental as it builds on existing work to detect specific discrepancies.

The paper tackles the problem of classification models being insensitive to unnatural arrangements of object parts in images, unlike human perception, by proposing a semi-supervised two-stage method to learn image grammar from natural images, achieving SSDI detection rates of 70% to 90% on corrupted datasets.

Despite considerable progress in image classification tasks, classification models seem unaffected by the images that significantly deviate from those that appear natural to human eyes. Specifically, while human perception can easily identify abnormal appearances or compositions in images, classification models overlook any alterations in the arrangement of object parts as long as they are present in any order, even if unnatural. Hence, this work exposes the vulnerability of having semantic and syntactic discrepancy in images (SSDI) in the form of corruptions that remove or shuffle image patches or present images in the form of puzzles. To address this vulnerability, we propose the concept of "image grammar", comprising "image semantics" and "image syntax". Image semantics pertains to the interpretation of parts or patches within an image, whereas image syntax refers to the arrangement of these parts to form a coherent object. We present a semi-supervised two-stage method for learning the image grammar of visual elements and environments solely from natural images. While the first stage learns the semantic meaning of individual object parts, the second stage learns how their relative arrangement constitutes an entire object. The efficacy of the proposed approach is then demonstrated by achieving SSDI detection rates ranging from 70% to 90% on corruptions generated from CelebA and SUN-RGBD datasets. Code is publicly available at: https://github.com/ChunTao1999/SSDI/

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes