Help the machine to help you: an evaluation in the wild of egocentric data cleaning via skeptical learning
This addresses data quality issues for personal assistants and similar applications, but it is incremental as it extends prior work by adding user confirmation.
The study evaluated Skeptical Learning (SKEL) for cleaning noisy user annotations in real-world conditions with university students over four weeks, finding it reduced annotation effort and improved data quality.
Any digital personal assistant, whether used to support task performance, answer questions, or manage work and daily life, including fitness schedules, requires high-quality annotations to function properly. However, user annotations, whether actively produced or inferred from context (e.g., data from smartphone sensors), are often subject to errors and noise. Previous research on Skeptical Learning (SKEL) addressed the issue of noisy labels by comparing offline active annotations with passive data, allowing for an evaluation of annotation accuracy. However, this evaluation did not include confirmation from end-users, the best judges of their own context. In this study, we evaluate SKEL's performance in real-world conditions with actual users who can refine the input labels based on their current perspectives and needs. The study involves university students using the iLog mobile application on their devices over a period of four weeks. The results highlight the challenges of finding the right balance between user effort and data quality, as well as the potential benefits of using SKEL, which include reduced annotation effort and improved quality of collected data.