Evaluating Voice Skills by Design Guidelines Using an Automatic Voice Crawler
This work addresses the need for systematic evaluation of voice application design to improve user experiences, though it is incremental as it applies existing guidelines to new data.
The study developed a voice crawler to evaluate 100 popular Alexa skills against 8 Amazon design guidelines, finding that basic commands were well-followed while personalized interactions were less compliant, with variations across categories.
Currently, adaptive voice applications supported by voice assistants (VA) are very popular (i.e., Alexa skills and Google Home Actions). Under this circumstance, how to design and evaluate these voice interactions well is very important. In our study, we developed a voice crawler to collect responses from 100 most popular Alexa skills under 10 different categories and evaluated these responses to find out how they comply with 8 selected design guidelines published by Amazon. Our findings show that basic commands support are the most followed ones while those related to personalised interaction are relatively less. There also exists variation in design guidelines compliance across different skill categories. Based on our findings and real skill examples, we offer suggestions for new guidelines to complement the existing ones and propose agendas for future HCI research to improve voice applications' user experiences.