Listen to the Image
This work addresses the challenge of efficiently improving visual perception aids for blind individuals, though it appears incremental as it focuses on evaluation methods rather than new sensory substitution paradigms.
The paper tackles the problem of evaluating visual-to-auditory sensory substitution devices for the blind by proposing machine models to assess encoding schemes instead of relying solely on human-based experiments. The results show high consistency between machine-based and human-based evaluations, indicating feasibility for accelerating optimization and reducing costs.
Visual-to-auditory sensory substitution devices can assist the blind in sensing the visual environment by translating the visual information into a sound pattern. To improve the translation quality, the task performances of the blind are usually employed to evaluate different encoding schemes. In contrast to the toilsome human-based assessment, we argue that machine model can be also developed for evaluation, and more efficient. To this end, we firstly propose two distinct cross-modal perception model w.r.t. the late-blind and congenitally-blind cases, which aim to generate concrete visual contents based on the translated sound. To validate the functionality of proposed models, two novel optimization strategies w.r.t. the primary encoding scheme are presented. Further, we conduct sets of human-based experiments to evaluate and compare them with the conducted machine-based assessments in the cross-modal generation task. Their highly consistent results w.r.t. different encoding schemes indicate that using machine model to accelerate optimization evaluation and reduce experimental cost is feasible to some extent, which could dramatically promote the upgrading of encoding scheme then help the blind to improve their visual perception ability.