通信学报 ›› 2022, Vol. 43 ›› Issue (3): 76-87.doi: 10.11959/j.issn.1000-436x.2022056

• 学术论文 • 上一篇    下一篇

基于SDN的实际网络流中Tor网页复合特征提取方法

言洪萍, 周强, 王世豪, 姚旺, 何刘坤, 王良民   

  1. 江苏大学计算机科学与通信工程学院,江苏 镇江 212013
  • 修回日期:2022-02-17 出版日期:2022-03-25 发布日期:2022-03-01
  • 作者简介:言洪萍(1985- ),男,江苏常州人,江苏大学博士生,主要研究方向为机器学习、匿名流量分析
    周强(1992- ),男,安徽安庆人,博士,江苏大学讲师,主要研究方向为机器学习、匿名流量分析
    王世豪(1996- ),男,山东济南人,江苏大学硕士生,主要研究方向为机器学习、匿名流量分析
    姚旺(1997- ),男,江苏滨海人,江苏大学硕士生,主要研究方向为机器学习、匿名流量分析
    何刘坤(1996- ),男,安徽潜山人,江苏大学硕士生,主要研究方向为机器学习、匿名流量分析
    王良民(1977- ),男,安徽潜山人,博士,江苏大学教授、博士生导师,主要研究方向为密码学与安全协议、物联网安全、大数据安全
  • 基金资助:
    国家自然科学基金资助项目(U1736216)

Composite Tor traffic features extraction method of webpage in actual network flow based on SDN

Hongping YAN, Qiang ZHOU, Shihao WANG, Wang YAO, Liukun HE, Liangmin WANG   

  1. School of Computer Science and Communication Engineering, Jiangsu University, Zhenjiang 212013, China
  • Revised:2022-02-17 Online:2022-03-25 Published:2022-03-01
  • Supported by:
    The National Natural Science Foundation of China(U1736216)

摘要:

基于网站指纹(WF)攻击的Tor网页流量识别方法往往建立在分离好的Tor流量甚至是分离好的Tor网页流量的基础上,但从实际网络的原始流中分离出Tor流量,再从Tor流量中分离出Tor网页流量,其计算量和困难程度远高于Tor网页流量的WF攻击本身。根据目前互联网的体系结构,利用网络流量汇聚到区域中心节点的特点,通过中心节点的 SDN 结构所提供的域内全局视角,结合 Tor 网络公开的节点信息提出了一种区分 Tor流量的双向统计特征(BSF),可以有效分离Tor流量;进而提出了一种基于LSF技术的网页流量隐藏特征提取方法,从而获得了基于BSF和LSF的复合流量特征(CTTF);在此基础上,针对当前Tor流量训练数据较少的问题,提出了一种基于平移的流量数据增强方法,使增强后的流量数据与真实工作环境中捕获的Tor流量数据分布尽量一致。实验结果表明,基于 CTTF 与仅使用原始数据特征相比,识别率提高了 4%左右,在训练数据较少时,使用流量数据增强方法后分类效果提升更加明显,并且可以有效降低误报率。

关键词: 流量发现, 流量识别, 统计特征, 数据增强

Abstract:

Website fingerprinting (WF) methods for Tor webpage traffic are often based on the separated Tor traffic or even the separated Tor webpage traffic.However, distinguishing Tor traffic from the original traffic of the actual network and Tor webpage traffic from the Tor traffic costs amount of computation, which is more difficult than the WF attack itself.According to the current architecture of the Internet and the characteristics of network traffic converging to regional central nodes, the bi-directional statistical feature (BSF) was proposed for distinguishing Tor traffic through the intra-domain global perspective provided by the SDN structure of the central node and the node information disclosed by the Tor network.Furthermore, a hidden feature extraction method for Web traffic based on lifted structure fingerprinting (LSF) was proposed, and a composited Tor-webpage-identification traffic feature (CTTF) was proposed based on BSF and LSF deep features.For solving the problem of traffic training data scarcity, a traffic data augmentation method based on translation was proposed, which made the augmented traffic data as consistent as the Tor traffic data captured in the real working environment.The experimental results show that the identification rate based on CTTF can be improved by about 4% compared with using only the original data features.When there is less training data, the classification accuracy is improved more obvious after using the traffic data augmentation method, and the false positive rate can be effectively reduced.

Key words: traffic discovery, traffic classification, statistical feature, data augmentation

中图分类号: 

No Suggested Reading articles found!