CLNov 21, 2024

Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions

Yu Zhao, Huifeng Yin, Bo Zeng, Hao Wang, Tianqi Shi, Chenyang Lyu, Longyue Wang, Weihua Luo, Kaifu Zhang

arXiv:2411.14405v228.7150 citationsh-index: 13Has Code

Originality Incremental advance

AI Analysis

This work addresses the problem of generalizing reasoning models to broader, real-world domains for AI researchers and practitioners, though it appears incremental by building on existing o1 models.

The paper tackles the challenge of extending large reasoning models to open-ended domains without clear standards or quantifiable rewards, achieving this through a combination of Chain-of-Thought fine-tuning, Monte Carlo Tree Search, reflection mechanisms, and innovative reasoning strategies.

Currently OpenAI o1 sparks a surge of interest in the study of large reasoning models (LRM). Building on this momentum, Marco-o1 not only focuses on disciplines with standard answers, such as mathematics, physics, and coding -- which are well-suited for reinforcement learning (RL) -- but also places greater emphasis on open-ended resolutions. We aim to address the question: ''Can the o1 model effectively generalize to broader domains where clear standards are absent and rewards are challenging to quantify?'' Marco-o1 is powered by Chain-of-Thought (CoT) fine-tuning, Monte Carlo Tree Search (MCTS), reflection mechanisms, and innovative reasoning strategies -- optimized for complex real-world problem-solving tasks.

View on arXiv PDF Code

Similar