CVMay 16, 2023

Mobile User Interface Element Detection Via Adaptively Prompt Tuning

arXiv:2305.09699v111 citationsHas Code
Originality Synthesis-oriented
AI Analysis

This work addresses a domain-specific problem for mobile UI detection, offering an incremental improvement by incorporating OCR data.

The paper tackles the problem of detecting Mobile User Interface (MUI) elements by addressing the oversight of OCR information, proposing an Adaptively Prompt Tuning (APT) module that achieves considerable improvements on two datasets.

Recent object detection approaches rely on pretrained vision-language models for image-text alignment. However, they fail to detect the Mobile User Interface (MUI) element since it contains additional OCR information, which describes its content and function but is often ignored. In this paper, we develop a new MUI element detection dataset named MUI-zh and propose an Adaptively Prompt Tuning (APT) module to take advantage of discriminating OCR information. APT is a lightweight and effective module to jointly optimize category prompts across different modalities. For every element, APT uniformly encodes its visual features and OCR descriptions to dynamically adjust the representation of frozen category prompts. We evaluate the effectiveness of our plug-and-play APT upon several existing CLIP-based detectors for both standard and open-vocabulary MUI element detection. Extensive experiments show that our method achieves considerable improvements on two datasets. The datasets is available at \url{github.com/antmachineintelligence/MUI-zh}.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes