AIAug 7, 2025
StructVRM: Aligning Multimodal Reasoning with Structured and Verifiable Reward ModelsXiangxiang Zhang, Jingxuan Wei, Donghong Zhong et al.
Existing Vision-Language Models often struggle with complex, multi-question reasoning tasks where partial correctness is crucial for effective learning. Traditional reward mechanisms, which provide a single binary score for an entire response, are too coarse to guide models through intricate problems with multiple sub-parts. To address this, we introduce StructVRM, a method that aligns multimodal reasoning with Structured and Verifiable Reward Models. At its core is a model-based verifier trained to provide fine-grained, sub-question-level feedback, assessing semantic and mathematical equivalence rather than relying on rigid string matching. This allows for nuanced, partial credit scoring in previously intractable problem formats. Extensive experiments demonstrate the effectiveness of StructVRM. Our trained model, Seed-StructVRM, achieves state-of-the-art performance on six out of twelve public multimodal benchmarks and our newly curated, high-difficulty STEM-Bench. The success of StructVRM validates that training with structured, verifiable rewards is a highly effective approach for advancing the capabilities of multimodal models in complex, real-world reasoning domains.
NEApr 1, 2019
A Seft-adaptive Multicellular GEP Algorithm Based On Fuzzy Control For Function OptimizationChuyan Deng, Yuzhong Peng, Hongya Li et al.
To improve the global optimization ability of traditional GEP algorithm, a Multicellular gene expression programming algorithm based on fuzzy control (Multicellular GEP Algorithm Based On Fuzzy Control, MGEP-FC) is proposed. The MGEP-FC algorithm describes the size of cross rate, mutation rate and real number mutation rate by constructing fuzzy membership function. According to the concentration and dispersion of individual fitness values in population, the crossover rate, mutation rate and real number set mutation rate of genetic operation are dynamically adjusted. In order to make the diversity of the population continue in the iterative process, a new genetic operation scheme is designed, which combines the new individuals with the parent population to build a temporary population, and the diversity of the temporary and subpopulation are optimized. The results of 12 Benchmark optimization experiments show that the MGEP-FC algorithm has been greatly improved in stability, global convergence and optimization speed.