Chinese Journal of Network and Information Security ›› 2024, Vol. 10 ›› Issue (1): 112-122.doi: 10.11959/j.issn.2096-109x.2024008

• Papers • Previous Articles    

Research on multi-granularity password analysis based on LLM

Meng HONG, Weidong QIU, Yangde WANG   

  1. School of Cyber Science and Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
  • Revised:2024-01-01 Online:2024-02-01 Published:2024-02-01
  • Supported by:
    The National Natural Science Foundation of China(61972249);The National Key R&D Program of China(2023YFB3106501)

Abstract:

Password-based authentication has been widely used as the primary authentication mechanism.However, occasional large-scale password leaks have highlighted the vulnerability of passwords to risks such as guessing or theft.In recent years, research on password analysis using natural language processing techniques has progressed, treating passwords as a special form of natural language.Nevertheless, limited studies have investigated the impact of password text segmentation granularity on the effectiveness of password analysis with large language models.A multi-granularity password-analyzing framework was proposed based on a large language model, which follows the pre-training paradigm and autonomously learns prior knowledge of password distribution from large unlabelled datasets.The framework comprised three modules: the synchronization network, backbone network, and tail network.The synchronization network module implemented char-level, template-level, and chunk-level password segmentation, extracting knowledge on character distribution, structure, word chunk composition, and other password features.The backbone network module constructed a generic password model to learn the rules governing password composition.The tail network module generated candidate passwords for guessing and analyzing target databases.Experimental evaluations were conducted on eight password databases including Tianya and Twitter, analyzing and summarizing the effectiveness of the proposed framework under different language environments and word segmentation granularities.The results indicate that in Chinese user scenarios, the performance of the password-analyzing framework based on char-level and chunk-level segmentation is comparable, and significantly superior to the framework based on template-level segmentation.In English user scenarios, the framework based on chunk-level segmentation demonstrates the best password-analyzing performance.

Key words: large language model, password analysis, natural language processing, word segmentation

CLC Number: 

No Suggested Reading articles found!