Chinese Journal of Network and Information Security ›› 2023, Vol. 9 ›› Issue (6): 166-175.doi: 10.11959/j.issn.2096-109x.2023091

• Papers • Previous Articles    

Autonomous security analysis and penetration testing model based on attack graph and deep Q-learning network

Cheng FAN, Guoqing HU, Taojie DING, Zhanhua ZHANG   

  1. The 31681 Unit of PLA, Tianshui 741000, China
  • Revised:2023-10-28 Online:2023-12-01 Published:2023-12-01
  • Supported by:
    TheNational Natural Science Foundation of China(61902426)

Abstract:

With the continuous development and widespread application of network technology, network security issues have become increasingly prominent.Penetration testing has emerged as an important method for assessing and enhancing network security.However, traditional manual penetration testing methods suffer from inefficiency,human error, and tester skills, leading to high uncertainty and poor evaluation results.To address these challenges, an autonomous security analysis and penetration testing framework called ASAPT was proposed, based on attack graphs and deep Q-learning networks (DQN).The ASAPT framework was consisted of two main components:training data construction and model training.In the training data construction phase, attack graphs were utilized to model the threats in the target network by representing vulnerabilities and possible attacker attack paths as nodes and edges.By integrating the common vulnerability scoring system (CVSS) vulnerability database, a “state-action”transition matrix was constructed, which depicted the attacker’s behavior and transition probabilities in different states.This matrix comprehensively captured the attacker’s capabilities and network security status.To reduce computational complexity, a depth-first search (DFS) algorithm was innovatively applied to simplify the transition matrix, identifying and preserving all attack paths that lead to the final goal for subsequent model training.In the model training phase, a deep reinforcement learning algorithm based on DQN was employed to determine the optimal attack path during penetration testing.The algorithm interacted continuously with the environment, updating the Q-value function to progressively optimize the selection of attack paths.Simulation results demonstrate that ASAPT achieves an accuracy of 84% in identifying the optimal path and exhibits fast convergence speed.Compared to traditional Q-learning, ASAPT demonstrates superior adaptability in dealing with large-scale network environments, which could provide guidance for practical penetration testing.

Key words: autonomous penetration testing, reinforcement learning, attack graph, deep Q-learning network

CLC Number: 

No Suggested Reading articles found!