User Prompting Strategies and Prompt Enhancement Methods for Open-Set Object Detection in XR Environments
This addresses robustness issues in interactive XR applications where user prompts are often ambiguous, though it is incremental as it builds on existing OSOD models.
The study investigated how open-set object detection models perform under realistic user prompting in XR environments, finding that ambiguous prompts degrade performance, but prompt enhancement improves robustness by over 55% mIoU and 41% average confidence.
Open-set object detection (OSOD) localizes objects while identifying and rejecting unknown classes at inference. While recent OSOD models perform well on benchmarks, their behavior under realistic user prompting remains underexplored. In interactive XR settings, user-generated prompts are often ambiguous, underspecified, or overly detailed. To study prompt-conditioned robustness, we evaluate two OSOD models, GroundingDINO and YOLO-E, on real-world XR images and simulate diverse user prompting behaviors using vision-language models. We consider four prompt types: standard, underdetailed, overdetailed, and pragmatically ambiguous, and examine the impact of two enhancement strategies on these prompts. Results show that both models exhibit stable performance under underdetailed and standard prompts, while they suffer degradation under ambiguous prompts. Overdetailed prompts primarily affect GroundingDINO. Prompt enhancement substantially improves robustness under ambiguity, yielding gains exceeding 55% mIoU and 41% average confidence. Based on the findings, we propose several prompting strategies and prompt enhancement methods for OSOD models in XR environments.