LGAINov 29, 2024

Solving Rubik's Cube Without Tricky Sampling

arXiv:2411.19583v12 citationsh-index: 2
Originality Incremental advance
AI Analysis

This addresses sparse-reward problems in RL, particularly for puzzles like Rubik's Cube, though it is incremental as it builds on prior RL methods but removes specific sampling requirements.

The paper tackled solving the Rubik's Cube with reinforcement learning by developing a policy gradient method that learns directly from scrambled states without near solved-state sampling, achieving over 99.4% success rate on the 2x2x2 cube in tests with 50,000 scrambles.

The Rubiks Cube, with its vast state space and sparse reward structure, presents a significant challenge for reinforcement learning (RL) due to the difficulty of reaching rewarded states. Previous research addressed this by propagating cost-to-go estimates from the solved state and incorporating search techniques. These approaches differ from human strategies that start from fully scrambled cubes, which can be tricky for solving a general sparse-reward problem. In this paper, we introduce a novel RL algorithm using policy gradient methods to solve the Rubiks Cube without relying on near solved-state sampling. Our approach employs a neural network to predict cost patterns between states, allowing the agent to learn directly from scrambled states. Our method was tested on the 2x2x2 Rubiks Cube, where the cube was scrambled 50,000 times, and the model successfully solved it in over 99.4% of cases. Notably, this result was achieved using only the policy network without relying on tree search as in previous methods, demonstrating its effectiveness and potential for broader applications in sparse-reward problems.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes