Joint Training of Generic CNN-CRF Models with Stochastic Optimization
This addresses the challenge of efficient joint training for CNN-CRF models in computer vision, though it appears incremental as it adapts existing stochastic methods to a new context.
The paper tackles the problem of training CNN-CRF models end-to-end by proposing a joint stochastic optimization method, showing it is general, scalable, and easy to implement, with empirical evaluation on semantic labeling of body parts in depth images demonstrating favorable performance compared to competing techniques.
We propose a new CNN-CRF end-to-end learning framework, which is based on joint stochastic optimization with respect to both Convolutional Neural Network (CNN) and Conditional Random Field (CRF) parameters. While stochastic gradient descent is a standard technique for CNN training, it was not used for joint models so far. We show that our learning method is (i) general, i.e. it applies to arbitrary CNN and CRF architectures and potential functions; (ii) scalable, i.e. it has a low memory footprint and straightforwardly parallelizes on GPUs; (iii) easy in implementation. Additionally, the unified CNN-CRF optimization approach simplifies a potential hardware implementation. We empirically evaluate our method on the task of semantic labeling of body parts in depth images and show that it compares favorably to competing techniques.