CV LGFeb 21, 2015

Regularization and Kernelization of the Maximin Correlation Approach

Taehoon Lee, Taesup Moon, Seung Jean Kim, Sungroh Yoon

arXiv:1502.06105v22.51 citationsh-index: 55

Originality Incremental advance

AI Analysis

This work addresses classification challenges in domains like optical character recognition and protein function prediction, but it is incremental as it builds on an existing method.

The authors tackled the problem of robust classification when classes contain multiple subclasses by proposing the regularized maximin correlation approach (R-MCA), which improves upon the original MCA by making it more robust to outliers, handling nonlinearities via kernelization, and reducing computational complexity, resulting in faster and more accurate performance in experiments.

Robust classification becomes challenging when each class consists of multiple subclasses. Examples include multi-font optical character recognition and automated protein function prediction. In correlation-based nearest-neighbor classification, the maximin correlation approach (MCA) provides the worst-case optimal solution by minimizing the maximum misclassification risk through an iterative procedure. Despite the optimality, the original MCA has drawbacks that have limited its wide applicability in practice. That is, the MCA tends to be sensitive to outliers, cannot effectively handle nonlinearities in datasets, and suffers from having high computational complexity. To address these limitations, we propose an improved solution, named regularized maximin correlation approach (R-MCA). We first reformulate MCA as a quadratically constrained linear programming (QCLP) problem, incorporate regularization by introducing slack variables in the primal problem of the QCLP, and derive the corresponding Lagrangian dual. The dual formulation enables us to apply the kernel trick to R-MCA so that it can better handle nonlinearities. Our experimental results demonstrate that the regularization and kernelization make the proposed R-MCA more robust and accurate for various classification tasks than the original MCA. Furthermore, when the data size or dimensionality grows, R-MCA runs substantially faster by solving either the primal or dual (whichever has a smaller variable dimension) of the QCLP.

View on arXiv PDF

Similar