CVHCNov 9, 2025

Scene-Aware Urban Design: A Human-AI Recommendation Framework Using Co-Occurrence Embeddings and Vision-Language Models

arXiv:2511.06201v1h-index: 4
Originality Synthesis-oriented
AI Analysis

This work addresses urban design challenges for planners and communities by enabling more continuous, local participation, though it is incremental as it builds on existing vision-language models and datasets.

The paper tackles the problem of micro-scale urban design by developing a human-in-the-loop framework that uses generative AI to propose design interventions in public spaces, resulting in a system that provides five statistically likely complements to anchor objects and suggests a third object to complete urban tactics.

This paper introduces a human-in-the-loop computer vision framework that uses generative AI to propose micro-scale design interventions in public space and support more continuous, local participation. Using Grounding DINO and a curated subset of the ADE20K dataset as a proxy for the urban built environment, the system detects urban objects and builds co-occurrence embeddings that reveal common spatial configurations. From this analysis, the user receives five statistically likely complements to a chosen anchor object. A vision language model then reasons over the scene image and the selected pair to suggest a third object that completes a more complex urban tactic. The workflow keeps people in control of selection and refinement and aims to move beyond top-down master planning by grounding choices in everyday patterns and lived experience.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes