SE#PCFG: Semantically Enhanced PCFG for Password Analysis and Cracking
This work addresses the problem of improving password security analysis and cracking for users across different languages, though it is incremental as it builds on existing PCFG methods with semantic enhancements.
The paper tackled the under-investigated problem of semantic information in user-generated passwords by proposing SE#PCFG, a semantically enhanced probabilistic context-free grammar framework, and SEPCA, a cracking architecture, which outperformed three state-of-the-art benchmarks by up to 94.11% in password coverage rates across multiple languages and test cases.
Much research has been done on user-generated textual passwords. Surprisingly, semantic information in such passwords remain under-investigated, with passwords created by English- and/or Chinese-speaking users being more studied with limited semantics. This paper fills this gap by proposing a general framework based on semantically enhanced PCFG (probabilistic context-free grammars) named SE#PCFG. It allowed us to consider 43 types of semantic information, the richest set considered so far, for password analysis. Applying SE#PCFG to 17 large leaked password databases of user speaking four languages (English, Chinese, German and French), we demonstrate its usefulness and report a wide range of new insights about password semantics at different levels such as cross-website password correlations. Furthermore, based on SE#PCFG and a new systematic smoothing method, we proposed the Semantically Enhanced Password Cracking Architecture (SEPCA), and compared its performance against three SOTA (state-of-the-art) benchmarks in terms of the password coverage rate: two other PCFG variants and neural network. Our experimental results showed that SEPCA outperformed all the three benchmarks consistently and significantly across 52 test cases, by up to 21.53%, 52.55% and 7.86%, respectively, at the user-level (with duplicate passwords). At the level of unique passwords, SEPCA also beats the three counterparts by up to 43.83%, 94.11% and 11.16%, respectively.