Journal on Communications ›› 2022, Vol. 43 ›› Issue (3): 76-87.doi: 10.11959/j.issn.1000-436x.2022056

• Papers • Previous Articles     Next Articles

Composite Tor traffic features extraction method of webpage in actual network flow based on SDN

Hongping YAN, Qiang ZHOU, Shihao WANG, Wang YAO, Liukun HE, Liangmin WANG   

  1. School of Computer Science and Communication Engineering, Jiangsu University, Zhenjiang 212013, China
  • Revised:2022-02-17 Online:2022-03-25 Published:2022-03-01
  • Supported by:
    The National Natural Science Foundation of China(U1736216)

Abstract:

Website fingerprinting (WF) methods for Tor webpage traffic are often based on the separated Tor traffic or even the separated Tor webpage traffic.However, distinguishing Tor traffic from the original traffic of the actual network and Tor webpage traffic from the Tor traffic costs amount of computation, which is more difficult than the WF attack itself.According to the current architecture of the Internet and the characteristics of network traffic converging to regional central nodes, the bi-directional statistical feature (BSF) was proposed for distinguishing Tor traffic through the intra-domain global perspective provided by the SDN structure of the central node and the node information disclosed by the Tor network.Furthermore, a hidden feature extraction method for Web traffic based on lifted structure fingerprinting (LSF) was proposed, and a composited Tor-webpage-identification traffic feature (CTTF) was proposed based on BSF and LSF deep features.For solving the problem of traffic training data scarcity, a traffic data augmentation method based on translation was proposed, which made the augmented traffic data as consistent as the Tor traffic data captured in the real working environment.The experimental results show that the identification rate based on CTTF can be improved by about 4% compared with using only the original data features.When there is less training data, the classification accuracy is improved more obvious after using the traffic data augmentation method, and the false positive rate can be effectively reduced.

Key words: traffic discovery, traffic classification, statistical feature, data augmentation

CLC Number: 

No Suggested Reading articles found!