Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions
This work addresses the problem of generalizing reasoning models to broader, real-world domains for AI researchers and practitioners, though it appears incremental by building on existing o1 models.
The paper tackles the challenge of extending large reasoning models to open-ended domains without clear standards or quantifiable rewards, achieving this through a combination of Chain-of-Thought fine-tuning, Monte Carlo Tree Search, reflection mechanisms, and innovative reasoning strategies.
Currently OpenAI o1 sparks a surge of interest in the study of large reasoning models (LRM). Building on this momentum, Marco-o1 not only focuses on disciplines with standard answers, such as mathematics, physics, and coding -- which are well-suited for reinforcement learning (RL) -- but also places greater emphasis on open-ended resolutions. We aim to address the question: ''Can the o1 model effectively generalize to broader domains where clear standards are absent and rewards are challenging to quantify?'' Marco-o1 is powered by Chain-of-Thought (CoT) fine-tuning, Monte Carlo Tree Search (MCTS), reflection mechanisms, and innovative reasoning strategies -- optimized for complex real-world problem-solving tasks.