Predicting and Explaining Mobile UI Tappability with Vision Modeling and Saliency Analysis
This work addresses the need for automated tappability prediction and explanation in mobile UI design, offering a tool for designers, but it is incremental as it builds on existing vision modeling and interpretability methods.
The paper tackles the problem of predicting whether elements in mobile UI screenshots are perceived as tappable by users, using a deep learning approach based on pixels alone, and achieves this by incorporating interpretability techniques like XRAI and k-Nearest Neighbors to explain predictions and provide design feedback.
We use a deep learning based approach to predict whether a selected element in a mobile UI screenshot will be perceived by users as tappable, based on pixels only instead of view hierarchies required by previous work. To help designers better understand model predictions and to provide more actionable design feedback than predictions alone, we additionally use ML interpretability techniques to help explain the output of our model. We use XRAI to highlight areas in the input screenshot that most strongly influence the tappability prediction for the selected region, and use k-Nearest Neighbors to present the most similar mobile UIs from the dataset with opposing influences on tappability perception.