CVNov 9, 2025

Seq2Seq Models Reconstruct Visual Jigsaw Puzzles without Seeing Them

arXiv:2511.06315v11 citationsh-index: 29
Originality Incremental advance
AI Analysis

This work addresses puzzle-solving for AI research by demonstrating an unconventional, non-visual approach that yields competitive performance, though it is incremental in applying language models to a new domain.

The paper tackled the problem of solving square jigsaw puzzles by using language models without visual input, reframing it as a sequence-to-sequence task, and achieved state-of-the-art results, often outperforming vision-based methods on benchmarks.

Jigsaw puzzles are primarily visual objects, whose algorithmic solutions have traditionally been framed from a visual perspective. In this work, however, we explore a fundamentally different approach: solving square jigsaw puzzles using language models, without access to raw visual input. By introducing a specialized tokenizer that converts each puzzle piece into a discrete sequence of tokens, we reframe puzzle reassembly as a sequence-to-sequence prediction task. Treated as "blind" solvers, encoder-decoder transformers accurately reconstruct the original layout by reasoning over token sequences alone. Despite being deliberately restricted from accessing visual input, our models achieve state-of-the-art results across multiple benchmarks, often outperforming vision-based methods. These findings highlight the surprising capability of language models to solve problems beyond their native domain, and suggest that unconventional approaches can inspire promising directions for puzzle-solving research.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes