AILGAug 14, 2025

A Curriculum Learning Approach to Reinforcement Learning: Leveraging RAG for Multimodal Question Answering

arXiv:2508.10337v11 citationsh-index: 4
Originality Synthesis-oriented
AI Analysis

This work addresses multimodal question answering for competition participants, presenting an incremental improvement through integration of existing techniques.

The paper tackled the META CRAG-MM challenge for multimodal question answering by developing a retrieval-augmented generation system using curriculum learning to guide reinforcement learning, achieving first place in Task 1 with a 52.38% lead and third place in Task 3.

This paper describes the solutions of the Dianping-Trust-Safety team for the META CRAG-MM challenge. The challenge requires building a comprehensive retrieval-augmented generation system capable for multi-modal multi-turn question answering. The competition consists of three tasks: (1) answering questions using structured data retrieved from an image-based mock knowledge graph, (2) synthesizing information from both knowledge graphs and web search results, and (3) handling multi-turn conversations that require context understanding and information aggregation from multiple sources. For Task 1, our solution is based on the vision large language model, enhanced by supervised fine-tuning with knowledge distilled from GPT-4.1. We further applied curriculum learning strategies to guide reinforcement learning, resulting in improved answer accuracy and reduced hallucination. For Task 2 and Task 3, we additionally leveraged web search APIs to incorporate external knowledge, enabling the system to better handle complex queries and multi-turn conversations. Our approach achieved 1st place in Task 1 with a significant lead of 52.38\%, and 3rd place in Task 3, demonstrating the effectiveness of the integration of curriculum learning with reinforcement learning in our training pipeline.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes