PassGPT: Password Modeling and (Guided) Generation with Large Language Models
This work addresses password security for cybersecurity applications by improving password guessing and strength estimation, though it is incremental as it applies existing LLM techniques to a new domain.
The paper tackles password modeling by training a large language model (PassGPT) on password leaks, which outperforms GAN-based methods by guessing twice as many previously unseen passwords and enables guided generation with constraints.
Large language models (LLMs) successfully model natural language from vast amounts of text without the need for explicit supervision. In this paper, we investigate the efficacy of LLMs in modeling passwords. We present PassGPT, a LLM trained on password leaks for password generation. PassGPT outperforms existing methods based on generative adversarial networks (GAN) by guessing twice as many previously unseen passwords. Furthermore, we introduce the concept of guided password generation, where we leverage PassGPT sampling procedure to generate passwords matching arbitrary constraints, a feat lacking in current GAN-based strategies. Lastly, we conduct an in-depth analysis of the entropy and probability distribution that PassGPT defines over passwords and discuss their use in enhancing existing password strength estimators.