Towards Open World Detection: A Survey
It synthesizes existing research to propose a framework for unifying diverse perception tasks, which is incremental as it builds on established subdomains.
This survey introduces Open World Detection (OWD) as a unifying term for class-agnostic and generally applicable detection models in computer vision, covering the convergence of tasks from early saliency detection to modern Vision Large Language Models.
For decades, Computer Vision has aimed at enabling machines to perceive the external world. Initial limitations led to the development of highly specialized niches. As success in each task accrued and research progressed, increasingly complex perception tasks emerged. This survey charts the convergence of these tasks and, in doing so, introduces Open World Detection (OWD), an umbrella term we propose to unify class-agnostic and generally applicable detection models in the vision domain. We start from the history of foundational vision subdomains and cover key concepts, methodologies and datasets making up today's state-of-the-art landscape. This traverses topics starting from early saliency detection, foreground/background separation, out of distribution detection and leading up to open world object detection, zero-shot detection and Vision Large Language Models (VLLMs). We explore the overlap between these subdomains, their increasing convergence, and their potential to unify into a singular domain in the future, perception.