CL AI IRDec 31, 2024

Exploring the Implicit Semantic Ability of Multimodal Large Language Models: A Pilot Study on Entity Set Expansion

Hebin Wang, Yangning Li, Yinghui Li, Hai-Tao Zheng, Wenhao Jiang, Hong-Gee Kim

arXiv:2501.00330v19.65 citationsh-index: 67ICASSP

Originality Incremental advance

AI Analysis

This work addresses limitations in MLLMs for entity-level semantic understanding, offering an incremental advancement in multimodal tasks.

The paper tackles the problem of extracting implicit semantic information in multimodal large language models (MLLMs) by applying them to the Multi-modal Entity Set Expansion (MESE) task, resulting in significant performance improvements with the introduction of the LUSAR method.

The rapid development of multimodal large language models (MLLMs) has brought significant improvements to a wide range of tasks in real-world applications. However, LLMs still exhibit certain limitations in extracting implicit semantic information. In this paper, we apply MLLMs to the Multi-modal Entity Set Expansion (MESE) task, which aims to expand a handful of seed entities with new entities belonging to the same semantic class, and multi-modal information is provided with each entity. We explore the capabilities of MLLMs to understand implicit semantic information at the entity-level granularity through the MESE task, introducing a listwise ranking method LUSAR that maps local scores to global rankings. Our LUSAR demonstrates significant improvements in MLLM's performance on the MESE task, marking the first use of generative MLLM for ESE tasks and extending the applicability of listwise ranking.

View on arXiv PDF

Similar