CVOct 31, 2016

A New Distance Measure for Non-Identical Data with Application to Image Classification

Muthukaruppan Swaminathan, Pankaj Kumar Yadav, Obdulio Piloto, Tobias Sjöblom, Ian Cheong

arXiv:1610.09766v12.114 citations

Originality Incremental advance

AI Analysis

This addresses a fundamental limitation in computer vision for tasks like image classification by improving accuracy when data comes from diverse sources, though it is an incremental advancement over existing distance measures.

The paper tackled the problem of existing distance measures assuming identically distributed features, which is inappropriate for real-world heterogeneous data, by proposing the Poisson-Binomial Radius (PBR) distance measure that accounts for non-identical distributions, and it outperformed state-of-the-art measures on most of twelve benchmark datasets across six image classification applications.

Distance measures are part and parcel of many computer vision algorithms. The underlying assumption in all existing distance measures is that feature elements are independent and identically distributed. However, in real-world settings, data generally originate from heterogeneous sources even if they do possess a common data-generating mechanism. Since these sources are not identically distributed by necessity, the assumption of identical distribution is inappropriate. Here, we use statistical analysis to show that feature elements of local image descriptors are indeed non-identically distributed. To test the effect of omitting the unified distribution assumption, we created a new distance measure called the Poisson-Binomial Radius (PBR). PBR is a bin-to-bin distance which accounts for the dispersion of bin-to-bin information. PBR's performance was evaluated on twelve benchmark data sets covering six different classification and recognition applications: texture, material, leaf, scene, ear biometrics and category-level image classification. Results from these experiments demonstrate that PBR outperforms state-of-the-art distance measures for most of the data sets and achieves comparable performance on the rest, suggesting that accounting for different distributions in distance measures can improve performance in classification and recognition tasks.

View on arXiv PDF

Similar