Robust Optimization for Deep Regression
This work addresses robustness to outliers in regression tasks like human pose and age estimation, offering incremental improvements for computer vision applications.
The paper tackles the problem of outliers degrading regression accuracy in ConvNets by proposing a robust loss function based on Tukey's biweight and a coarse-to-fine model, resulting in faster convergence, better generalization, and comparable or better results than state-of-the-art approaches on human pose estimation datasets.
Convolutional Neural Networks (ConvNets) have successfully contributed to improve the accuracy of regression-based methods for computer vision tasks such as human pose estimation, landmark localization, and object detection. The network optimization has been usually performed with L2 loss and without considering the impact of outliers on the training process, where an outlier in this context is defined by a sample estimation that lies at an abnormal distance from the other training sample estimations in the objective space. In this work, we propose a regression model with ConvNets that achieves robustness to such outliers by minimizing Tukey's biweight function, an M-estimator robust to outliers, as the loss function for the ConvNet. In addition to the robust loss, we introduce a coarse-to-fine model, which processes input images of progressively higher resolutions for improving the accuracy of the regressed values. In our experiments, we demonstrate faster convergence and better generalization of our robust loss function for the tasks of human pose estimation and age estimation from face images. We also show that the combination of the robust loss function with the coarse-to-fine model produces comparable or better results than current state-of-the-art approaches in four publicly available human pose estimation datasets.