CVFeb 15, 2022

Privacy Preserving Visual Question Answering

arXiv:2202.07712v11 citations
Originality Incremental advance
AI Analysis

This addresses privacy concerns for edge-based VQA applications by preventing image recovery from model outputs.

The paper tackles privacy-preserving Visual Question Answering by constructing a non-differentiable symbolic representation of visual scenes using a low-complexity vision model, achieving a model that is 25 times smaller than SOTA vision models and 100 times smaller than end-to-end SOTA VQA models.

We introduce a novel privacy-preserving methodology for performing Visual Question Answering on the edge. Our method constructs a symbolic representation of the visual scene, using a low-complexity computer vision model that jointly predicts classes, attributes and predicates. This symbolic representation is non-differentiable, which means it cannot be used to recover the original image, thereby keeping the original image private. Our proposed hybrid solution uses a vision model which is more than 25 times smaller than the current state-of-the-art (SOTA) vision models, and 100 times smaller than end-to-end SOTA VQA models. We report detailed error analysis and discuss the trade-offs of using a distilled vision model and a symbolic representation of the visual scene.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes