CVFeb 1, 2017

Pixel-wise Ear Detection with Convolutional Encoder-Decoder Networks

arXiv:1702.00307v23 citations
Originality Incremental advance
AI Analysis

This improves biometric recognition systems by providing robust, pixel-wise ear detection despite occlusions and variable conditions, though it is incremental as it adapts an existing architecture to a specific domain.

The paper tackles ear detection in unconstrained settings by formulating it as a two-class segmentation problem using a convolutional encoder-decoder network, achieving significantly better performance than existing state-of-the-art methods on a web-gathered dataset.

Object detection and segmentation represents the basis for many tasks in computer and machine vision. In biometric recognition systems the detection of the region-of-interest (ROI) is one of the most crucial steps in the overall processing pipeline, significantly impacting the performance of the entire recognition system. Existing approaches to ear detection, for example, are commonly susceptible to the presence of severe occlusions, ear accessories or variable illumination conditions and often deteriorate in their performance if applied on ear images captured in unconstrained settings. To address these shortcomings, we present in this paper a novel ear detection technique based on convolutional encoder-decoder networks (CEDs). For our technique, we formulate the problem of ear detection as a two-class segmentation problem and train a convolutional encoder-decoder network based on the SegNet architecture to distinguish between image-pixels belonging to either the ear or the non-ear class. The output of the network is then post-processed to further refine the segmentation result and return the final locations of the ears in the input image. Different from competing techniques from the literature, our approach does not simply return a bounding box around the detected ear, but provides detailed, pixel-wise information about the location of the ears in the image. Our experiments on a dataset gathered from the web (a.k.a. in the wild) show that the proposed technique ensures good detection results in the presence of various covariate factors and significantly outperforms the existing state-of-the-art.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes