Convolutional Neural Networks for User Identificationbased on Motion Sensors Represented as Image
This addresses user authentication for smartphone security, offering a practical system with improved generalization over baselines, though it is incremental as it builds on existing deep learning methods.
The paper tackles smartphone user identification by analyzing motion sensor signals from a single tap gesture, transforming them into images for a pre-trained CNN to extract features, achieving 89.75% accuracy in multi-class classification and 96.72% in few-shot identification.
In this paper, we propose a deep learning approach for smartphone user identification based on analyzing motion signals recorded by the accelerometer and the gyroscope, during a single tap gesture performed by the user on the screen. We transform the discrete 3-axis signals from the motion sensors into a gray-scale image representation which is provided as input to a convolutional neural network (CNN) that is pre-trained for multi-class user classification. In the pre-training stage, we benefit from different users and multiple samples per user. After pre-training, we use our CNN as feature extractor, generating an embedding associated to each single tap on the screen. The resulting embeddings are used to train a Support Vector Machines (SVM) model in a few-shot user identification setting, i.e. requiring only 20 taps on the screen during the registration phase. We compare our identification system based on CNN features with two baseline systems, one that employs handcrafted features and another that employs recurrent neural network (RNN) features. All systems are based on the same classifier, namely SVM. To pre-train the CNN and the RNN models for multi-class user classification, we use a different set of users than the set used for few-shot user identification, ensuring a realistic scenario. The empirical results demonstrate that our CNN model yields a top accuracy of 89.75% in multi-class user classification and a top accuracy of 96.72% in few-shot user identification. In conclusion, we believe that our system is ready for practical use, having a better generalization capacity than both baselines.