Multi-scale Visual Attention & Saliency Modelling with Decision Theory
This work addresses visual attention modeling for computer vision applications, presenting an incremental improvement by combining existing techniques in a multi-scale framework.
The paper tackled the problem of estimating visual saliency by modeling it as a binary classification task, using multi-scale discriminant power integrated with wavelet transformation and Hidden Markov Trees to produce saliency maps. The result was evaluated quantitatively with standard metrics on a database, showing competitive performance against the AIM method.
Bottom-up saliency, an early human visual processing, behaves like binary classification of interest and null hypothesis. Its discriminant power, mutual information of image features and class distribution, is closely related to saliency value by the well-known centre-surround theory. As classification accuracy very much depends on window sizes, the discriminant saliency (power) varies according to sampling scales. Discriminating power estimation in multi-scales framework needs integrating with wavelet transformation and then estimating statistical discrepancy of two consecutive scales (centre-surround windows) by Hidden Markov Tree (HMT) model. Finally, multi-scale discriminant saliency (MDIS) maps are combined by the maximum information rule to synthesize a final saliency map. All MDIS maps are evaluated with standard quantitative tools (NSS,LCC,AUC) on N.Bruce's database with ground truth data as eye-tracking locations ; as well assessed qualitatively by visual examination of individual cases. For evaluating MDIS against well-known AIM saliency method, simulations are needed and described in details with several interesting conclusions, drawn for further research directions.