HCJul 19, 2020

Geno: A Developer Tool for Authoring Multimodal Interaction on Existing Web Applications

Ritam Jyoti Sarmah, Yunpeng Ding, Di Wang, Cheuk Yin Phipson Lee, Toby Jia-Jun Li, Xiang 'Anthony' Chen

arXiv:2007.09809v19.620 citations

Originality Incremental advance

AI Analysis

This tool addresses the challenge for developers in creating multimodal interfaces for web applications, though it is incremental as it builds on existing GUI-based apps.

The authors tackled the problem of adding voice command support to existing web applications, which is effort-consuming and has a high learning barrier, by developing Geno, a developer tool that enables developers with little NLP expertise to add multimodal voice input to web apps, as demonstrated in a study where developers successfully implemented voice commands for two apps.

Supporting voice commands in applications presents significant benefits to users. However, adding such support to existing GUI-based web apps is effort-consuming with a high learning barrier, as shown in our formative study, due to the lack of unified support for creating multimodal interfaces. We present Geno---a developer tool for adding the voice input modality to existing web apps without requiring significant NLP expertise. Geno provides a high-level workflow for developers to specify functionalities to be supported by voice (intents), create language models for detecting intents and the relevant information (parameters) from user utterances, and fulfill the intents by either programmatically invoking the corresponding functions or replaying GUI actions on the web app. Geno further supports multimodal references to GUI context in voice commands (e.g. "move this [event] to next week" while pointing at an event with the cursor). In a study, developers with little NLP expertise were able to add multimodal voice command support for two existing web apps using Geno.

View on arXiv PDF

Similar