CV CYFeb 13, 2025

EmoAssist: Emotional Assistant for Visual Impairment Community

arXiv:2502.09285v16.22 citationsh-index: 2IJCNN

Originality Incremental advance

AI Analysis

It addresses emotional needs for the visual impairment community, but is incremental as it builds on existing LMMs with a new benchmark and tuning method.

This paper tackles the problem of visual impairment (VI) assistive systems lacking emotional intelligence by introducing the EmoAssist Benchmark and Model, which improve empathy and suggestion metrics by 147.8% and 89.7% respectively compared to pre-tuning models and outperform GPT-4o.

The rapid advancement of large multi-modality models (LMMs) has significantly propelled the integration of artificial intelligence into practical applications. Visual Question Answering (VQA) systems, which can process multi-modal data including vision, text, and audio, hold great potential for assisting the Visual Impairment (VI) community in navigating complex and dynamic real-world environments. However, existing VI assistive LMMs overlook the emotional needs of VI individuals, and current benchmarks lack emotional evaluation of these LMMs. To address these gaps, this paper introduces the EmoAssist Benchmark, a comprehensive benchmark designed to evaluate the assistive performance of LMMs for the VI community. To the best of our knowledge, this is the first benchmark that incorporates emotional intelligence as a key consideration. Furthermore, we propose the EmoAssist Model, an Emotion-Assistive LMM specifically designed for the VI community. The EmoAssist Model utilizes Direct Preference Optimization (DPO) to align outputs with human emotional preferences. Experiment results demonstrate that the EmoAssist Model significantly enhances the recognition of implicit emotions and intentions of VI users, delivers empathetic responses, and provides actionable guidance. Specifically, it shows respective improvements of 147.8% and 89.7% in the Empathy and Suggestion metrics on the EmoAssist Benchmark, compared to the pre-tuning LMM, and even outperforms state-of-the-art LLMs such as GPT-4o.

View on arXiv PDF

Similar