Image Enhanced Rotation Prediction for Self-Supervised Learning
This work addresses a problem in self-supervised learning for computer vision by improving representation learning for textures, but it is incremental as it builds on existing rotation prediction methods.
The paper tackles the limitation of rotation prediction in self-supervised learning, which captures object shapes but not textures, by introducing image enhanced rotation prediction (IE-Rot) that simultaneously predicts rotation and image enhancement; experimental results show IE-Rot outperforms rotation prediction on benchmarks like ImageNet classification, PASCAL-VOC detection, and COCO detection/segmentation.
The rotation prediction (Rotation) is a simple pretext-task for self-supervised learning (SSL), where models learn useful representations for target vision tasks by solving pretext-tasks. Although Rotation captures information of object shapes, it hardly captures information of textures. To tackle this problem, we introduce a novel pretext-task called image enhanced rotation prediction (IE-Rot) for SSL. IE-Rot simultaneously solves Rotation and another pretext-task based on image enhancement (e.g., sharpening and solarizing) while maintaining simplicity. Through the simultaneous prediction of rotation and image enhancement, models learn representations to capture the information of not only object shapes but also textures. Our experimental results show that IE-Rot models outperform Rotation on various standard benchmarks including ImageNet classification, PASCAL-VOC detection, and COCO detection/segmentation.