3D Morphable Models as Spatial Transformer Networks
This work addresses 3D pose normalization for computer vision tasks, offering an incremental extension to spatial transformer networks.
The paper tackles the problem of 3D pose changes and self-occlusions in images by using a 3D Morphable Model as a spatial transformer network, resulting in robust normalization on highly uncontrolled images with occlusion and large pose changes.
In this paper, we show how a 3D Morphable Model (i.e. a statistical model of the 3D shape of a class of objects such as faces) can be used to spatially transform input data as a module (a 3DMM-STN) within a convolutional neural network. This is an extension of the original spatial transformer network in that we are able to interpret and normalise 3D pose changes and self-occlusions. The trained localisation part of the network is independently useful since it learns to fit a 3D morphable model to a single image. We show that the localiser can be trained using only simple geometric loss functions on a relatively small dataset yet is able to perform robust normalisation on highly uncontrolled images including occlusion, self-occlusion and large pose changes.