基于SDN的实际网络流中Tor网页复合特征提取方法

doi:10.11959/j.issn.1000-436x.2022056

摘要/Abstract

摘要：

基于网站指纹（WF）攻击的Tor网页流量识别方法往往建立在分离好的Tor流量甚至是分离好的Tor网页流量的基础上，但从实际网络的原始流中分离出Tor流量，再从Tor流量中分离出Tor网页流量，其计算量和困难程度远高于Tor网页流量的WF攻击本身。根据目前互联网的体系结构，利用网络流量汇聚到区域中心节点的特点，通过中心节点的 SDN 结构所提供的域内全局视角，结合 Tor 网络公开的节点信息提出了一种区分 Tor流量的双向统计特征（BSF），可以有效分离Tor流量；进而提出了一种基于LSF技术的网页流量隐藏特征提取方法，从而获得了基于BSF和LSF的复合流量特征（CTTF）；在此基础上，针对当前Tor流量训练数据较少的问题，提出了一种基于平移的流量数据增强方法，使增强后的流量数据与真实工作环境中捕获的Tor流量数据分布尽量一致。实验结果表明，基于 CTTF 与仅使用原始数据特征相比，识别率提高了 4%左右，在训练数据较少时，使用流量数据增强方法后分类效果提升更加明显，并且可以有效降低误报率。

关键词: 流量发现, 流量识别, 统计特征, 数据增强

Abstract:

Website fingerprinting (WF) methods for Tor webpage traffic are often based on the separated Tor traffic or even the separated Tor webpage traffic.However, distinguishing Tor traffic from the original traffic of the actual network and Tor webpage traffic from the Tor traffic costs amount of computation, which is more difficult than the WF attack itself.According to the current architecture of the Internet and the characteristics of network traffic converging to regional central nodes, the bi-directional statistical feature (BSF) was proposed for distinguishing Tor traffic through the intra-domain global perspective provided by the SDN structure of the central node and the node information disclosed by the Tor network.Furthermore, a hidden feature extraction method for Web traffic based on lifted structure fingerprinting (LSF) was proposed, and a composited Tor-webpage-identification traffic feature (CTTF) was proposed based on BSF and LSF deep features.For solving the problem of traffic training data scarcity, a traffic data augmentation method based on translation was proposed, which made the augmented traffic data as consistent as the Tor traffic data captured in the real working environment.The experimental results show that the identification rate based on CTTF can be improved by about 4% compared with using only the original data features.When there is less training data, the classification accuracy is improved more obvious after using the traffic data augmentation method, and the false positive rate can be effectively reduced.

Key words: traffic discovery, traffic classification, statistical feature, data augmentation

中图分类号:

TP393

言洪萍, 周强, 王世豪, 姚旺, 何刘坤, 王良民. 基于SDN的实际网络流中Tor网页复合特征提取方法[J]. 通信学报, 2022, 43(3): 76-87.

Hongping YAN, Qiang ZHOU, Shihao WANG, Wang YAO, Liukun HE, Liangmin WANG. Composite Tor traffic features extraction method of webpage in actual network flow based on SDN[J]. Journal on Communications, 2022, 43(3): 76-87.

图/表 9

图1

图2

图3

图4

图5

图6

图7

图8

图9

参考文献 37

[1]	HERRMANN D , WENDOLSKY R , FEDERRATH H . Website fingerprinting:attacking popular privacy enhancing technologies with the multinomial na?ve-Bayes classifier[C]// Proceedings of the 2009 ACM Workshop on Cloud Computing Security. New York:ACM Press, 2009: 31-42.
[2]	PANCHENKO A , NIESSEN L , ZINNEN A ,et al. Website fingerprinting in onion routing based anonymization networks[C]// Proceedings of the 10th Annual ACM Workshop on Privacy in the Electronic Society. New York:ACM Press, 2011: 103-114.
[3]	WANG T , CAI X , NITHYANAND R ,et al. Effective attacks and provable defenses for website fingerprinting[C]// 23rd USENIX Security Symposium. Berkeley:USENIX Association, 2014: 143-157.
[4]	SCOTT-HAYWARD S , O’CALLAGHAN G , SEZER S . SDN security:a survey[C]// Proceedings of 2013 IEEE SDN for Future Networks and Services. Piscataway:IEEE Press, 2013: 1-7.
[5]	魏松杰, 孙鑫, 赵茹东 ,等. SDN中IP欺骗数据分组网络溯源方法研究[J]. 通信学报, 2018,39(11): 181-189.
	WEI S J , SUN X , ZHAO R D ,et al. Tracing IP-spoofed packets in software defined network[J]. Journal on Communications, 2018,39(11): 181-189.
[6]	SONG H O , XIANG Y , JEGELKA S ,et al. Deep metric learning via lifted structured feature embedding[C]// Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE Press, 2016: 4004-4012.
[7]	祝现威, 常朝稳, 朱智强 ,等. 基于身份属性的SDN控制转发方法[J]. 通信学报, 2019,40(11): 1-18.
	ZHU X W , CHANG C W , ZHU Z Q ,et al. SDN control and forwarding method based on identity attribute[J]. Journal on Communications, 2019,40(11): 1-18.
[8]	BENZEKKI K , FERGOUGUI A E , ELALAOUI A E . Software-defined networking (SDN):a survey[J]. Security and Communication Networks, 2016,9(18): 5803-5833.
[9]	OCONNOR T , ENCK W , PETULLO W M ,et al. PivotWall:SDN-based information flow control[C]// Proceedings of the Symposium on SDN Research.[S.l.:s.n.], 2018: 1-14.
[10]	LING Z , LUO J Z , XU D N ,et al. Novel and practical SDN-based traceback technique for malicious traffic over anonymous networks[C]// Proceedings of IEEE INFOCOM 2019 - IEEE Conference on Computer Communications. Piscataway:IEEE Press, 2019: 1180-1188.
[11]	MOORE A W , PAPAGIANNAKI K . Toward the accurate identification of network applications[C]// International Workshop on Passive and Active Network Measurement,Berlin:Springer, 2005: 41-54.
[12]	FINSTERBUSCH M , RICHTER C , ROCHA E ,et al. A survey of payload-based traffic classification approaches[J]. IEEE Communications Surveys ＆ Tutorials, 2014,16(2): 1135-1156.
[13]	WANG T , GOLDBERG I . Improved website fingerprinting on tor[C]// Proceedings of the 12th ACM Workshop on Privacy in the Electronic Society. New York:ACM Press, 2013: 201-212.
[14]	CAI X , ZHANG X C , JOSHI B ,et al. Touching from a distance:website fingerprinting attacks and defenses[C]// Proceedings of the 2012 ACM Conference on Computer and Communications Security. New York:ACM Press, 2012: 605-616.
[15]	HE K M , ZHANG X Y , REN S Q ,et al. Deep residual learning for image recognition[C]// Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE Press, 2016: 770-778.
[16]	ZHU J Y , PARK T , ISOLA P ,et al. Unpaired image-to-image translation using cycle-consistent adversarial networks[C]// Proceedings of 2017 IEEE International Conference on Computer Vision. Piscataway:IEEE Press, 2017: 2242-2251.
[17]	SIRINAM P , IMANI M , JUAREZ M ,et al. Deep fingerprinting:undermining website fingerprinting defenses with deep learning[C]// Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security. New York:ACM Press, 2018: 1928-1943.
[18]	RIMMER V , PREUVENEERS D , JUAREZ M ,et al. Automated website fingerprinting through deep learning[C]// Proceedings of 2018 Network and Distributed System Security Symposium. Reston:Internet Society, 2018: 1-16.
[19]	BHAT S , LU D , KWON A ,et al. Var-CNN:a data-efficient website fingerprinting attack based on deep learning[J]. Proceedings on Privacy Enhancing Technologies, 2019,2019(4): 292-310.
[20]	SHEN M , LIU Y T , ZHU L H ,et al. Fine-grained webpage fingerprinting using only packet length information of encrypted traffic[J]. IEEE Transactions on Information Forensics and Security, 2021,16: 2046-2059.
[21]	CADENA W D L , MITSEVA A , HILLER J ,et al. TrafficSliver:fighting website fingerprinting attacks with traffic splitting[C]// Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security. New York:ACM Press, 2020: 1971-1985.
[22]	HARDEGEN C , PFüLB B , RIEGER S ,et al. Predicting network flow characteristics using deep learning and real-world network traffic[J]. IEEE Transactions on Network and Service Management, 2020,17(4): 2662-2676.
[23]	SHI Y , MATSUURA K . Fingerprinting attack on the tor anonymity system[C]// International Conference on Information and Communications Security. Berlin:Springer, 2009: 425-438.
[24]	PANCHENKO A , LANZE F B , ZINNEN A ,et al. Website fingerprinting at Internet scale[C]// Proceedings of 2016 Network and Distributed System Security Symposium. Reston:Internet Society, 2016: 1-15.
[25]	HAYES J , DANEZIS G . K-fingerprinting:a robust scalable website fingerprinting technique[J]. arXiv Preprint,arXiv:1509.00789, 2015.
[26]	MATIC S , TRONCOSO C , CABALLERO J . Dissecting tor bridges:a security evaluation of their private and public infrastructures[C]// Proceedings of 2017 Network and Distributed System Security Symposium. Reston:Internet Society, 2017: 1-15.
[27]	LING Z , LUO J Z , YU W ,et al. Extensive analysis and large-scale empirical evaluation of tor bridge discovery[C]// 2012 Proceedings of IEEE INFOCOM. Piscataway:IEEE Press, 2012: 2381-2389.
[28]	KINGMA D P , BA J . Adam:a method for stochastic optimization[J]. arXiv Preprint,arXiv:1412.6980, 2014.
[29]	WANG L M , MEI H T , SHENG V S . Multilevel identification and classification analysis of tor on mobile and PC platforms[J]. IEEE Transactions on Industrial Informatics, 2021,17(2): 1079-1088.
[30]	LOTFOLLAHI M , SIAVOSHANI M J , ZADE R S H ,et al. Deep packet:a novel approach for encrypted traffic classification using deep learning[J]. Soft Computing, 2020,24(3): 1999-2012.
[31]	CHOPRA S , HADSELL R , LECUN Y . Learning a similarity metric discriminatively,with application to face verification[C]// Proceedings of 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE Press, 2005: 539-546.
[32]	SCHROFF F , KALENICHENKO D , PHILBIN J . FaceNet:a unified embedding for face recognition and clustering[C]// Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE Press, 2015: 815-823.
[33]	SOHN K , . Improved deep metric learning with multiclass n-pair loss objective[C]// Advances in Neural Information Processing Systems.[S.l.:s.n.], 2016: 1857-1865.
[34]	PEREZ L , WANG J . The effectiveness of data augmentation in image classification using deep learning[J]. arXiv Preprint,arXiv:1712.04621, 2017.
[35]	兰巨龙, 张学帅, 胡宇翔 ,等. 基于深度强化学习的软件定义网络QoS优化[J]. 通信学报, 2019,40(12): 60-67.
	LAN J L , ZHANG X S , HU Y X ,et al. Software-defined networking QoS optimization based on deep reinforcement learning[J]. Journal on Communications, 2019,40(12): 60-67.
[36]	SUTSKEVER I , MARTENS J , DAHL G ,et al. On the importance of initialization and momentum in deep learning[C]// International Conference on Machine Learning.[S.l.:s.n.], 2013: 1139-1147.
[37]	ZEILER M D . ADADELTA:an adaptive learning rate method[J]. arXiv Preprint,arXiv:1212.5701, 2012.