CVAIOct 20, 2025

FineVision: Open Data Is All You Need

arXiv:2510.17269v127 citationsh-index: 18
Originality Incremental advance
AI Analysis

This addresses the fragmented data landscape for vision-language model researchers, offering a large-scale, clean resource to accelerate data-centric research, though it is incremental in improving data quality rather than introducing a new model paradigm.

The authors tackled the problem of inconsistent and contaminated public datasets hindering vision-language model advancement by introducing FineVision, a meticulously curated and unified corpus of 24 million samples, which led to models trained on it consistently outperforming those on existing open mixtures across a broad evaluation suite.

The advancement of vision-language models (VLMs) is hampered by a fragmented landscape of inconsistent and contaminated public datasets. We introduce FineVision, a meticulously collected, curated, and unified corpus of 24 million samples - the largest open resource of its kind. We unify more than 200 sources into 185 subsets via a semi-automated, human-in-the-loop pipeline: automation performs bulk ingestion and schema mapping, while reviewers audit mappings and spot-check outputs to verify faithful consumption of annotations, appropriate formatting and diversity, and safety; issues trigger targeted fixes and re-runs. The workflow further applies rigorous de-duplication within and across sources and decontamination against 66 public benchmarks. FineVision also encompasses agentic/GUI tasks with a unified action space; reviewers validate schemas and inspect a sample of trajectories to confirm executable fidelity. Models trained on FineVision consistently outperform those trained on existing open mixtures across a broad evaluation suite, underscoring the benefits of scale, data hygiene, and balanced automation with human oversight. We release the corpus and curation tools to accelerate data-centric VLM research.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes