CL AIMar 6, 2025

Knowledge-Decoupled Synergetic Learning: An MLLM based Collaborative Approach to Few-shot Multimodal Dialogue Intention Recognition

Bin Chen, Yu Zhang, Hongfei Ye, Ziyi Huang, Hongyang Chen

arXiv:2503.04201v112.06 citationsh-index: 2WWW

Originality Incremental advance

AI Analysis

This work addresses a critical challenge in e-commerce by improving few-shot multimodal dialogue intention recognition, though it appears incremental as it builds on existing post-training techniques with a novel collaborative approach.

The paper tackles the seesaw effect in few-shot multimodal dialogue intention recognition for e-commerce by proposing Knowledge-Decoupled Synergetic Learning (KDSL), which uses smaller models to transform knowledge into rules and larger models for post-training, achieving improvements of 6.37% and 6.28% in online weighted F1 scores on Taobao datasets compared to state-of-the-art methods.

Few-shot multimodal dialogue intention recognition is a critical challenge in the e-commerce domainn. Previous methods have primarily enhanced model classification capabilities through post-training techniques. However, our analysis reveals that training for few-shot multimodal dialogue intention recognition involves two interconnected tasks, leading to a seesaw effect in multi-task learning. This phenomenon is attributed to knowledge interference stemming from the superposition of weight matrix updates during the training process. To address these challenges, we propose Knowledge-Decoupled Synergetic Learning (KDSL), which mitigates these issues by utilizing smaller models to transform knowledge into interpretable rules, while applying the post-training of larger models. By facilitating collaboration between the large and small multimodal large language models for prediction, our approach demonstrates significant improvements. Notably, we achieve outstanding results on two real Taobao datasets, with enhancements of 6.37\% and 6.28\% in online weighted F1 scores compared to the state-of-the-art method, thereby validating the efficacy of our framework.

View on arXiv PDF

Similar