CVAug 7, 2022

Label Semantic Knowledge Distillation for Unbiased Scene Graph Generation

arXiv:2208.03763v131 citationsh-index: 38
Originality Incremental advance
AI Analysis

This work addresses bias in SGG for computer vision applications, but it is incremental as it builds on existing training paradigms with a novel distillation approach.

The paper tackles the problem of bias in Scene Graph Generation (SGG) models by addressing overlooked dataset characteristics, such as multiple reasonable predicates for instances and missing annotations, and proposes a Label Semantic Knowledge Distillation method that consistently achieves decent trade-off performance across predicate categories.

The Scene Graph Generation (SGG) task aims to detect all the objects and their pairwise visual relationships in a given image. Although SGG has achieved remarkable progress over the last few years, almost all existing SGG models follow the same training paradigm: they treat both object and predicate classification in SGG as a single-label classification problem, and the ground-truths are one-hot target labels. However, this prevalent training paradigm has overlooked two characteristics of current SGG datasets: 1) For positive samples, some specific subject-object instances may have multiple reasonable predicates. 2) For negative samples, there are numerous missing annotations. Regardless of the two characteristics, SGG models are easy to be confused and make wrong predictions. To this end, we propose a novel model-agnostic Label Semantic Knowledge Distillation (LS-KD) for unbiased SGG. Specifically, LS-KD dynamically generates a soft label for each subject-object instance by fusing a predicted Label Semantic Distribution (LSD) with its original one-hot target label. LSD reflects the correlations between this instance and multiple predicate categories. Meanwhile, we propose two different strategies to predict LSD: iterative self-KD and synchronous self-KD. Extensive ablations and results on three SGG tasks have attested to the superiority and generality of our proposed LS-KD, which can consistently achieve decent trade-off performance between different predicate categories.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes