GA-DAN: Geometry-Aware Domain Adaptation Network for Scene Text Detection and Recognition
This work addresses domain adaptation for scene text tasks, offering a novel method that improves detection and recognition, but it is incremental as it builds on existing adversarial learning approaches.
The paper tackles the problem of cross-domain shifts in both geometry and appearance spaces for scene text detection and recognition by introducing GA-DAN, which uses a multi-modal spatial learning technique and disentangled cycle-consistency loss, resulting in superior performance with domain-adapted images in experiments.
Recent adversarial learning research has achieved very impressive progress for modelling cross-domain data shifts in appearance space but its counterpart in modelling cross-domain shifts in geometry space lags far behind. This paper presents an innovative Geometry-Aware Domain Adaptation Network (GA-DAN) that is capable of modelling cross-domain shifts concurrently in both geometry space and appearance space and realistically converting images across domains with very different characteristics. In the proposed GA-DAN, a novel multi-modal spatial learning technique is designed which converts a source-domain image into multiple images of different spatial views as in the target domain. A new disentangled cycle-consistency loss is introduced which balances the cycle consistency in appearance and geometry spaces and improves the learning of the whole network greatly. The proposed GA-DAN has been evaluated for the classic scene text detection and recognition tasks, and experiments show that the domain-adapted images achieve superior scene text detection and recognition performance while applied to network training.