Few-Shot Unsupervised Image-to-Image Translation on complex scenes
This work addresses image translation for content-rich scenes, but it is incremental as it adapts an existing method.
The paper tackles the problem of unsupervised image-to-image translation on complex scenes by extending the FUNIT framework with a more diverse dataset and object detection, resulting in improved performance beyond single-object applications.
Unsupervised image-to-image translation methods have received a lot of attention in the last few years. Multiple techniques emerged tackling the initial challenge from different perspectives. Some focus on learning as much as possible from several target style images for translations while other make use of object detection in order to produce more realistic results on content-rich scenes. In this work, we assess how a method that has initially been developed for single object translation performs on more diverse and content-rich images. Our work is based on the FUNIT[1] framework and we train it with a more diverse dataset. This helps understanding how such method behaves beyond their initial frame of application. We present a way to extend a dataset based on object detection. Moreover, we propose a way to adapt the FUNIT framework in order to leverage the power of object detection that one can see in other methods.