TapNet: The Design, Training, Implementation, and Applications of a Multi-Task Learning CNN for Off-Screen Mobile Input
This addresses the need for practical one-handed interaction on mobile devices, though it is incremental as it builds on existing multi-task learning and sensor processing approaches.
The paper tackled the problem of enabling off-screen mobile input without specialized hardware by using deep learning to process built-in IMU sensors, resulting in TapNet, a multi-task CNN that detects tapping with significant improvement over state-of-the-art methods, as validated on datasets with over 135K training samples.
To make off-screen interaction without specialized hardware practical, we investigate using deep learning methods to process the common built-in IMU sensor (accelerometers and gyroscopes) on mobile phones into a useful set of one-handed interaction events. We present the design, training, implementation and applications of TapNet, a multi-task network that detects tapping on the smartphone. With phone form factor as auxiliary information, TapNet can jointly learn from data across devices and simultaneously recognize multiple tap properties, including tap direction and tap location. We developed two datasets consisting of over 135K training samples, 38K testing samples, and 32 participants in total. Experimental evaluation demonstrated the effectiveness of the TapNet design and its significant improvement over the state of the art. Along with the datasets, (https://sites.google.com/site/michaelxlhuang/datasets/tapnet-dataset), and extensive experiments, TapNet establishes a new technical foundation for off-screen mobile input.