Large Scale Local Online Similarity/Distance Learning Framework based on Passive/Aggressive
This work addresses the problem of overfitting and scalability in local metric learning for machine learning and data mining applications, though it appears incremental as it extends existing global online methods to a local context.
The paper tackles the challenge of learning local similarity/distance metrics in real-world datasets where feature discrimination varies across regions, by proposing an online multiple metric learning framework that combines global and local components to reduce overfitting and improve scalability. The result is a scalable method that efficiently handles high-dimensional data while maintaining predictive performance, based on Passive/Aggressive and extended with Dual Random Projection.
Similarity/Distance measures play a key role in many machine learning, pattern recognition, and data mining algorithms, which leads to the emergence of metric learning field. Many metric learning algorithms learn a global distance function from data that satisfy the constraints of the problem. However, in many real-world datasets that the discrimination power of features varies in the different regions of input space, a global metric is often unable to capture the complexity of the task. To address this challenge, local metric learning methods are proposed that learn multiple metrics across the different regions of input space. Some advantages of these methods are high flexibility and the ability to learn a nonlinear mapping but typically achieves at the expense of higher time requirement and overfitting problem. To overcome these challenges, this research presents an online multiple metric learning framework. Each metric in the proposed framework is composed of a global and a local component learned simultaneously. Adding a global component to a local metric efficiently reduce the problem of overfitting. The proposed framework is also scalable with both sample size and the dimension of input data. To the best of our knowledge, this is the first local online similarity/distance learning framework based on PA (Passive/Aggressive). In addition, for scalability with the dimension of input data, DRP (Dual Random Projection) is extended for local online learning in the present work. It enables our methods to be run efficiently on high-dimensional datasets, while maintains their predictive performance. The proposed framework provides a straightforward local extension to any global online similarity/distance learning algorithm based on PA.