Foreground Object Search by Distilling Composite Image Feature
This work addresses the need for efficient and realistic foreground object retrieval for composite image generation, though it appears incremental as it builds on prior discriminator-based approaches.
The paper tackles the problem of foreground object search (FOS) by proposing DiscoFOS, a method that uses a teacher-student network to distill composite image features, achieving superior retrieval performance on newly contributed synthetic and real datasets.
Foreground object search (FOS) aims to find compatible foreground objects for a given background image, producing realistic composite image. We observe that competitive retrieval performance could be achieved by using a discriminator to predict the compatibility of composite image, but this approach has unaffordable time cost. To this end, we propose a novel FOS method via distilling composite feature (DiscoFOS). Specifically, the abovementioned discriminator serves as teacher network. The student network employs two encoders to extract foreground feature and background feature. Their interaction output is enforced to match the composite image feature from the teacher network. Additionally, previous works did not release their datasets, so we contribute two datasets for FOS task: S-FOSD dataset with synthetic composite images and R-FOSD dataset with real composite images. Extensive experiments on our two datasets demonstrate the superiority of the proposed method over previous approaches. The dataset and code are available at https://github.com/bcmi/Foreground-Object-Search-Dataset-FOSD.