Multimodal Peer Review Simulation with Actionable To-Do Recommendations for Community-Aware Manuscript Revisions
This work addresses the need for better pre-submission manuscript revision tools for academic researchers, though it appears incremental as it builds on existing LLM and RAG techniques.
The authors tackled the problem of automated academic peer review systems being limited to text-only inputs and lacking actionable feedback by developing a multimodal, community-aware peer review simulation system that integrates textual and visual information through multimodal LLMs and retrieval-augmented generation. The system generated more comprehensive and useful reviews aligned with expert standards, surpassing ablated baselines.
While large language models (LLMs) offer promising capabilities for automating academic workflows, existing systems for academic peer review remain constrained by text-only inputs, limited contextual grounding, and a lack of actionable feedback. In this work, we present an interactive web-based system for multimodal, community-aware peer review simulation to enable effective manuscript revisions before paper submission. Our framework integrates textual and visual information through multimodal LLMs, enhances review quality via retrieval-augmented generation (RAG) grounded in web-scale OpenReview data, and converts generated reviews into actionable to-do lists using the proposed Action:Objective[\#] format, providing structured and traceable guidance. The system integrates seamlessly into existing academic writing platforms, providing interactive interfaces for real-time feedback and revision tracking. Experimental results highlight the effectiveness of the proposed system in generating more comprehensive and useful reviews aligned with expert standards, surpassing ablated baselines and advancing transparent, human-centered scholarly assistance.