CR AI CLJul 19, 2024

PassTSL: Modeling Human-Created Passwords through Two-Stage Learning

Yangde Wang, Haozhang Li, Weidong Qiu, Shujun Li, Peng Tang

arXiv:2407.14145v15.86 citationsh-index: 9

Originality Incremental advance

AI Analysis

This work addresses password security for cybersecurity applications by enhancing cracking methods to inform better defenses, though it is incremental as it adapts existing NLP frameworks to a specific domain.

The paper tackled the problem of modeling human-created passwords for improved cracking and strength estimation by proposing PassTSL, a two-stage learning method inspired by NLP pretraining-finetuning, which outperformed five state-of-the-art password cracking methods by 4.11% to 64.69% and reduced unsafe errors in password strength estimation compared to other methods.

Textual passwords are still the most widely used user authentication mechanism. Due to the close connections between textual passwords and natural languages, advanced technologies in natural language processing (NLP) and machine learning (ML) could be used to model passwords for different purposes such as studying human password-creation behaviors and developing more advanced password cracking methods for informing better defence mechanisms. In this paper, we propose PassTSL (modeling human-created Passwords through Two-Stage Learning), inspired by the popular pretraining-finetuning framework in NLP and deep learning (DL). We report how different pretraining settings affected PassTSL and proved its effectiveness by applying it to six large leaked password databases. Experimental results showed that it outperforms five state-of-the-art (SOTA) password cracking methods on password guessing by a significant margin ranging from 4.11% to 64.69% at the maximum point. Based on PassTSL, we also implemented a password strength meter (PSM), and our experiments showed that it was able to estimate password strength more accurately, causing fewer unsafe errors (overestimating the password strength) than two other SOTA PSMs when they produce the same rate of safe errors (underestimating the password strength): a neural-network based method and zxcvbn. Furthermore, we explored multiple finetuning settings, and our evaluations showed that, even a small amount of additional training data, e.g., only 0.1% of the pretrained data, can lead to over 3% improvement in password guessing on average. We also proposed a heuristic approach to selecting finetuning passwords based on JS (Jensen-Shannon) divergence and experimental results validated its usefulness. In summary, our contributions demonstrate the potential and feasibility of applying advanced NLP and ML methods to password modeling and cracking.

View on arXiv PDF

Similar