AI CL IRMay 3, 2025

Advancing AI Research Assistants with Expert-Involved Learning

Tianyu Liu, Simeng Han, Xiao Luo, Hanchen Wang, Pan Lu, Biqing Zhu, Yuge Wang, Keyi Li, Jiapeng Chen, Rihao Qu, Yufeng Liu, Xinyue Cui

arXiv:2505.04638v25.82 citationsh-index: 22Has Code

Originality Incremental advance

AI Analysis

This work addresses the problem of ensuring trustworthy AI in biomedicine for researchers, though it is incremental as it builds on existing evaluation methods.

The paper tackles the unreliability of large language and multimodal models in biomedical discovery by introducing ARIEL, an open-source framework for evaluation and optimization, finding that models generate incomplete summaries and struggle with visual reasoning, but improvements through prompt engineering and fine-tuning enhance performance.

Large language models (LLMs) and large multimodal models (LMMs) promise to accelerate biomedical discovery, yet their reliability remains unclear. We introduce ARIEL (AI Research Assistant for Expert-in-the-Loop Learning), an open-source evaluation and optimization framework that pairs a curated multimodal biomedical corpus with expert-vetted tasks to probe two capabilities: full-length article summarization and fine-grained figure interpretation. Using uniform protocols and blinded PhD-level evaluation, we find that state-of-the-art models generate fluent but incomplete summaries, whereas LMMs struggle with detailed visual reasoning. We later observe that prompt engineering and lightweight fine-tuning substantially improve textual coverage, and a compute-scaled inference strategy enhances visual question answering. We build an ARIEL agent that integrates textual and visual cues, and we show it can propose testable mechanistic hypotheses. ARIEL delineates current strengths and limitations of foundation models, and provides a reproducible platform for advancing trustworthy AI in biomedicine.

View on arXiv PDF Code

Similar