CVJan 14, 2021
TypeNet: Deep Learning Keystroke BiometricsAlejandro Acien, Aythami Morales, John V. Monaco et al.
We study the performance of Long Short-Term Memory networks for keystroke biometric authentication at large scale in free-text scenarios. For this we explore the performance of Long Short-Term Memory (LSTMs) networks trained with a moderate number of keystrokes per identity and evaluated under different scenarios including: i) three learning approaches depending on the loss function (softmax, contrastive, and triplet loss); ii) different number of training samples and lengths of keystroke sequences; iii) four databases based on two device types (physical vs touchscreen keyboard); and iv) comparison with existing approaches based on both traditional statistical methods and deep learning architectures. Our approach called TypeNet achieves state-of-the-art keystroke biometric authentication performance with an Equal Error Rate of 2.2% and 9.2% for physical and touchscreen keyboards, respectively, significantly outperforming previous approaches. Our experiments demonstrate a moderate increase in error with up to 100,000 subjects, demonstrating the potential of TypeNet to operate at an Internet scale. To the best of our knowledge, the databases used in this work are the largest existing free-text keystroke databases available for research with more than 136 million keystrokes from 168,000 subjects in physical keyboards, and 60,000 subjects with more than 63 million keystrokes acquired on mobile touchscreens.
CRMay 15, 2020
Keystroke Biometrics in Response to Fake News Propagation in a Global PandemicAythami Morales, Alejandro Acien, Julian Fierrez et al.
This work proposes and analyzes the use of keystroke biometrics for content de-anonymization. Fake news have become a powerful tool to manipulate public opinion, especially during major events. In particular, the massive spread of fake news during the COVID-19 pandemic has forced governments and companies to fight against missinformation. In this context, the ability to link multiple accounts or profiles that spread such malicious content on the Internet while hiding in anonymity would enable proactive identification and blacklisting. Behavioral biometrics can be powerful tools in this fight. In this work, we have analyzed how the latest advances in keystroke biometric recognition can help to link behavioral typing patterns in experiments involving 100,000 users and more than 1 million typed sequences. Our proposed system is based on Recurrent Neural Networks adapted to the context of content de-anonymization. Assuming the challenge to link the typed content of a target user in a pool of candidate profiles, our results show that keystroke recognition can be used to reduce the list of candidate profiles by more than 90%. In addition, when keystroke is combined with auxiliary data (such as location), our system achieves a Rank-1 identification performance equal to 52.6% and 10.9% for a background candidate list composed of 1K and 100K profiles, respectively.
CVApr 7, 2020
TypeNet: Scaling up Keystroke BiometricsAlejandro Acien, John V. Monaco, Aythami Morales et al.
We study the suitability of keystroke dynamics to authenticate 100K users typing free-text. For this, we first analyze to what extent our method based on a Siamese Recurrent Neural Network (RNN) is able to authenticate users when the amount of data per user is scarce, a common scenario in free-text keystroke authentication. With 1K users for testing the network, a population size comparable to previous works, TypeNet obtains an equal error rate of 4.8% using only 5 enrollment sequences and 1 test sequence per user with 50 keystrokes per sequence. Using the same amount of data per user, as the number of test users is scaled up to 100K, the performance in comparison to 1K decays relatively by less than 5%, demonstrating the potential of TypeNet to scale well at large scale number of users. Our experiments are conducted with the Aalto University keystroke database. To the best of our knowledge, this is the largest free-text keystroke database captured with more than 136M keystrokes from 168K users.
NEMar 10, 2017
Integer Factorization with a Neuromorphic SieveJohn V. Monaco, Manuel M. Vindiola
The bound to factor large integers is dominated by the computational effort to discover numbers that are smooth, typically performed by sieving a polynomial sequence. On a von Neumann architecture, sieving has log-log amortized time complexity to check each value for smoothness. This work presents a neuromorphic sieve that achieves a constant time check for smoothness by exploiting two characteristic properties of neuromorphic architectures: constant time synaptic integration and massively parallel computation. The approach is validated by modifying msieve, one of the fastest publicly available integer factorization implementations, to use the IBM Neurosynaptic System (NS1e) as a coprocessor for the sieving stage.
CRSep 24, 2016
Obfuscating Keystroke Time Intervals to Avoid Identification and ImpersonationJohn V. Monaco, Charles C. Tappert
There are numerous opportunities for adversaries to observe user behavior remotely on the web. Additionally, keystroke biometric algorithms have advanced to the point where user identification and soft biometric trait recognition rates are commercially viable. This presents a privacy concern because masking spatial information, such as IP address, is not sufficient as users become more identifiable by their behavior. In this work, the well-known Chaum mix is generalized to a scenario in which users are separated by both space and time with the goal of preventing an observing adversary from identifying or impersonating the user. The criteria of a behavior obfuscation strategy are defined and two strategies are introduced for obfuscating typing behavior. Experimental results are obtained using publicly available keystroke data for three different types of input, including short fixed-text, long fixed-text, and long free-text. Identification accuracy is reduced by 20% with a 25 ms random keystroke delay not noticeable to the user.
ITJul 13, 2016
The Partially Observable Hidden Markov Model and its Application to Keystroke DynamicsJohn V. Monaco, Charles C. Tappert
The partially observable hidden Markov model is an extension of the hidden Markov Model in which the hidden state is conditioned on an independent Markov chain. This structure is motivated by the presence of discrete metadata, such as an event type, that may partially reveal the hidden state but itself emanates from a separate process. Such a scenario is encountered in keystroke dynamics whereby a user's typing behavior is dependent on the text that is typed. Under the assumption that the user can be in either an active or passive state of typing, the keyboard key names are event types that partially reveal the hidden state due to the presence of relatively longer time intervals between words and sentences than between letters of a word. Using five public datasets, the proposed model is shown to consistently outperform other anomaly detectors, including the standard HMM, in biometric identification and verification tasks and is generally preferred over the HMM in a Monte Carlo goodness of fit test.
CRJun 29, 2016
Robust Keystroke Biometric Anomaly DetectionJohn V. Monaco
The Keystroke Biometrics Ongoing Competition (KBOC) presented an anomaly detection challenge with a public keystroke dataset containing a large number of subjects and real-world aspects. Over 300 subjects typed case-insensitive repetitions of their first and last name, and as a result, keystroke sequences could vary in length and order depending on the usage of modifier keys. To deal with this, a keystroke alignment preprocessing algorithm was developed to establish a semantic correspondence between keystrokes in mismatched sequences. The method is robust in the sense that query keystroke sequences need only approximately match a target sequence, and alignment is agnostic to the particular anomaly detector used. This paper describes the fifteen best-performing anomaly detection systems submitted to the KBOC, which ranged from auto-encoding neural networks to ensemble methods. Manhattan distance achieved the lowest equal error rate of 5.32%, while all fifteen systems performed better than any other submission. Performance gains are shown to be due in large part not to the particular anomaly detector, but to preprocessing and score normalization techniques.