Evaluating Generative AI Tools for Personalized Offline Recommendations: A Comparative Study
It addresses the underexplored effectiveness of generative AI in health-related behavioral interventions for reducing technology use, though it is incremental as a comparative evaluation of existing tools.
This study evaluated five generative AI tools for recommending non-digital activities to individuals at risk of repetitive strain injury, finding that Tool A achieved the highest F1-score of 0.85 and user satisfaction rating of 4.2 out of 5.
Background: Generative AI tools have become increasingly relevant in supporting personalized recommendations across various domains. However, their effectiveness in health-related behavioral interventions, especially those aiming to reduce the use of technology, remains underexplored. Aims: This study evaluates the performance and user satisfaction of the five most widely used generative AI tools when recommending non-digital activities tailored to individuals at risk of repetitive strain injury. Method: Following the Goal/Question/Metric (GQM) paradigm, this proposed experiment involves generative AI tools that suggest offline activities based on predefined user profiles and intervention scenarios. The evaluation is focused on quantitative performance (precision, recall, F1-score and MCC-score) and qualitative aspects (user satisfaction and perceived recommendation relevance). Two research questions were defined: RQ1 assessed which tool delivers the most accurate recommendations, and RQ2 evaluated how tool choice influences user satisfaction.