Learning Local Feature Aggregation Functions with Backpropagation
This addresses the problem of improving feature representation for computer vision tasks, offering a novel learning approach that is incremental over existing aggregation methods.
The paper tackles the problem of learning optimal local feature aggregation functions for classification tasks by backpropagating gradients from the classifier cost function to update the aggregation parameters. Experiments show the method outperforms state-of-the-art approaches like Bag of Words, Fisher Vectors, and VLAD by a large margin on motion and visual descriptor datasets.
This paper introduces a family of local feature aggregation functions and a novel method to estimate their parameters, such that they generate optimal representations for classification (or any task that can be expressed as a cost function minimization problem). To achieve that, we compose the local feature aggregation function with the classifier cost function and we backpropagate the gradient of this cost function in order to update the local feature aggregation function parameters. Experiments on synthetic datasets indicate that our method discovers parameters that model the class-relevant information in addition to the local feature space. Further experiments on a variety of motion and visual descriptors, both on image and video datasets, show that our method outperforms other state-of-the-art local feature aggregation functions, such as Bag of Words, Fisher Vectors and VLAD, by a large margin.