IRAIMMFeb 9, 2025

Uni-Retrieval: A Multi-Style Retrieval Framework for STEM's Education

arXiv:2502.05863v217 citationsh-index: 5ACL
Originality Incremental advance
AI Analysis

This addresses retrieval ambiguities for educational scenarios, offering a scalable solution, though it is incremental as it builds on existing vision-language models.

The paper tackles the problem of ambiguous text-image retrieval in AI-facilitated STEM education by proposing Uni-Retrieval, a multi-style retrieval framework that supports diverse query styles, and it outperforms existing models in most tasks.

In AI-facilitated teaching, leveraging various query styles to interpret abstract text descriptions is crucial for ensuring high-quality teaching. However, current retrieval models primarily focus on natural text-image retrieval, making them insufficiently tailored to educational scenarios due to the ambiguities in the retrieval process. In this paper, we propose a diverse expression retrieval task tailored to educational scenarios, supporting retrieval based on multiple query styles and expressions. We introduce the STEM Education Retrieval Dataset (SER), which contains over 24,000 query pairs of different styles, and the Uni-Retrieval, an efficient and style-diversified retrieval vision-language model based on prompt tuning. Uni-Retrieval extracts query style features as prototypes and builds a continuously updated Prompt Bank containing prompt tokens for diverse queries. This bank can updated during test time to represent domain-specific knowledge for different subject retrieval scenarios. Our framework demonstrates scalability and robustness by dynamically retrieving prompt tokens based on prototype similarity, effectively facilitating learning for unknown queries. Experimental results indicate that Uni-Retrieval outperforms existing retrieval models in most retrieval tasks. This advancement provides a scalable and precise solution for diverse educational needs.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes