What and Where: A Context-based Recommendation System for Object Insertion
This addresses a practical problem for applications in semi-automated advertising and video composition, though it appears incremental as it builds on existing object insertion concepts with a unified framework.
The paper tackles the dual problem of recommending objects to insert into scenes and retrieving suitable scenes for given objects, predicting bounding boxes for inserted objects to assist applications like advertising and video composition. Their unsupervised algorithm based on object-level contexts outperforms baselines on all subtasks in experiments on a newly annotated test set.
In this work, we propose a novel topic consisting of two dual tasks: 1) given a scene, recommend objects to insert, 2) given an object category, retrieve suitable background scenes. A bounding box for the inserted object is predicted in both tasks, which helps downstream applications such as semi-automated advertising and video composition. The major challenge lies in the fact that the target object is neither present nor localized at test time, whereas available datasets only provide scenes with existing objects. To tackle this problem, we build an unsupervised algorithm based on object-level contexts, which explicitly models the joint probability distribution of object categories and bounding boxes with a Gaussian mixture model. Experiments on our newly annotated test set demonstrate that our system outperforms existing baselines on all subtasks, and do so under a unified framework. Our contribution promises future extensions and applications.