Enhanced Multimodal Aspect-Based Sentiment Analysis by LLM-Generated Rationales
This addresses the challenge of limited capacity in existing MABSA methods for researchers and practitioners, though it is incremental as it builds on prior work by integrating LLM-generated rationales.
The paper tackles the problem of inaccurate aspect and sentiment identification in Multimodal Aspect-Based Sentiment Analysis (MABSA) by proposing a framework that combines small language models (SLMs) with rationales generated by large language models (LLMs), resulting in superior performance on three benchmarks.
There has been growing interest in Multimodal Aspect-Based Sentiment Analysis (MABSA) in recent years. Existing methods predominantly rely on pre-trained small language models (SLMs) to collect information related to aspects and sentiments from both image and text, with an aim to align these two modalities. However, small SLMs possess limited capacity and knowledge, often resulting in inaccurate identification of meaning, aspects, sentiments, and their interconnections in textual and visual data. On the other hand, Large language models (LLMs) have shown exceptional capabilities in various tasks by effectively exploring fine-grained information in multimodal data. However, some studies indicate that LLMs still fall short compared to fine-tuned small models in the field of ABSA. Based on these findings, we propose a novel framework, termed LRSA, which combines the decision-making capabilities of SLMs with additional information provided by LLMs for MABSA. Specifically, we inject explanations generated by LLMs as rationales into SLMs and employ a dual cross-attention mechanism for enhancing feature interaction and fusion, thereby augmenting the SLMs' ability to identify aspects and sentiments. We evaluated our method using two baseline models, numerous experiments highlight the superiority of our approach on three widely-used benchmarks, indicating its generalizability and applicability to most pre-trained models for MABSA.