CVAILGJul 29, 2025

CHECK-MAT: Checking Hand-Written Mathematical Answers for the Russian Unified State Exam

arXiv:2507.22958v12 citationsh-index: 1Has Code
Originality Synthesis-oriented
AI Analysis

This addresses the need for AI-assisted assessment of student work in education, though it is incremental as it builds on existing VLM capabilities for a specific domain.

The paper tackles the problem of evaluating Vision-Language Models (VLMs) on assessing hand-written mathematical solutions by introducing the EGE-Math Solutions Assessment Benchmark, which includes 122 scanned solutions from the Russian Unified State Exam with expert grades, and finds that current VLMs show limitations in mathematical reasoning and alignment with human rubrics.

This paper introduces a novel benchmark, EGE-Math Solutions Assessment Benchmark, for evaluating Vision-Language Models (VLMs) on their ability to assess hand-written mathematical solutions. Unlike existing benchmarks that focus on problem solving, our approach centres on understanding student solutions, identifying mistakes, and assigning grades according to fixed criteria. We compile 122 scanned solutions from the Russian Unified State Exam (EGE) together with official expert grades, and evaluate seven modern VLMs from Google, OpenAI, Arcee AI, and Alibaba Cloud in three inference modes. The results reveal current limitations in mathematical reasoning and human-rubric alignment, opening new research avenues in AI-assisted assessment. You can find code in https://github.com/Karifannaa/Auto-check-EGE-math

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes