网络与信息安全学报 ›› 2022, Vol. 8 ›› Issue (5): 158-166.doi: 10.11959/j.issn.2096-109x.2022065

• 学术论文 • 上一篇    下一篇

CAT-RFE:点击欺诈的集成检测框架

卢翼翔1, 耿光刚1, 延志伟2, 朱效民3, 张新常4   

  1. 1 暨南大学网络空间安全学院,广东 广州510632
    2 中国互联网络信息中心,北京 100190
    3 山东齐鲁大数据研究院,山东 济南 250001
    4 山东省科学院,山东 济南 250001
  • 修回日期:2022-01-05 出版日期:2022-10-15 发布日期:2022-10-01
  • 作者简介:卢翼翔(1995-),男,广东潮州人,暨南大学硕士生,主要研究方向为统计机器学习、网络空间安全
    耿光刚(1980- ),男,山东泰安人,博士,暨南大学教授,主要研究方向为机器学习、大数据分析和互联网基础资源安全
    延志伟(1985- ),男,山西兴县人,博士,中国互联网络信息中心研究员,主要研究方向为 IPv6 移动性管理、BGP安全机制、信息中心网络架构
    朱效民(1982- ),男,山东莱芜人,博士,山东齐鲁大数据研究院副研究员,主要研究方向为高性能计算、大数据分析
    张新常(1975- ),男,山东新泰人,博士,山东省科学院教授,主要研究方向为智能网络、网络架构与协议,工业互联网
  • 基金资助:
    国家自然科学基金(92067108);广东省自然科学基金(2021A1515011314)

CAT-RFE: ensemble detection framework for click fraud

Yixiang LU1, Guanggang GENG1, Zhiwei YAN2, Xiaomin ZHU3, Xinchang ZHANG4   

  1. 1 College of Cyber Security, Jinan University, Guangzhou 510632, China
    2 China Internet Network Information Center, Beijing 100190, China
    3 Shandong Institute of Big Data, Jinan 250001, China
    4 Shandong Academy of Sciences, Jinan 250001, China
  • Revised:2022-01-05 Online:2022-10-15 Published:2022-10-01
  • Supported by:
    The NationalNatural Science Foundation of China(92067108);The Natural Science Foundation of Guangdong Province(2021A1515011314)

摘要:

点击欺诈是近年来最常见的网络犯罪手段之一,互联网广告行业每年都会因点击欺诈而遭受巨大损失。为了能够在海量点击中有效地检测欺诈点击,构建了多种充分结合广告点击与时间属性关系的特征,并提出了一种点击欺诈检测的集成学习框架——CAT-RFE集成学习框架。CAT-RFE集成学习框架包含3个部分:基分类器、递归特征消除(RFE,recursive feature elimination)和voting集成学习。其中,将适用于类别特征的梯度提升模型——CatBoost(categorical boosting)作为基分类器;RFE是基于贪心策略的特征选择方法,可在多组特征中选出较好的特征组合;Voting集成学习是采用投票的方式将多个基分类器的结果进行组合的学习方法。该框架通过CatBoost和RFE在特征空间中获取多组较优的特征组合,再在这些特征组合下的训练结果通过voting进行集成,获得集成的点击欺诈检测结果。该框架采用了相同的基分类器和集成学习方法,不仅克服了差异较大的分类器相互制约而导致集成结果不理想的问题,也克服了 RFE 在选择特征时容易陷入局部最优解的问题,具备更好的检测能力。在实际互联网点击欺诈数据集上的性能评估和对比实验结果显示,CAT-RFE集成学习框架的点击欺诈检测能力超过了CatBoost模型、CatBoost和RFE组合的模型以及其他机器学习模型,证明该框架具备良好的竞争力。该框架为互联网广告点击欺诈检测提供一种可行的解决方案。

关键词: 点击欺诈检测, 类别梯度提升, 递归特征消除, 集成学习

Abstract:

Click fraud is one of the most common methods of cybercrime in recent years, and the Internet advertising industry suffers huge losses every year because of click fraud.In order to effectively detect fraudulent clicks within massive clicks, a variety of features that fully combine the relationship between advertising clicks and time attributes were constructed.Besides, an ensemble learning framework for click fraud detection was proposed, namely CAT-RFE ensemble learning framework.The CAT-RFE ensemble learning framework consisted of three parts: base classifier, recursive feature elimination (RFE) and voting ensemble learning.Among them, the gradient boosting model suitable for category features-CatBoost was used as the base classifier.RFE was a feature selection method based on greedy strategy, which can select a better feature combination from multiple sets of features.Voting ensemble learning was a learning method that combined the results of multiple base classifiers by voting.The framework obtained multiple sets of optimal feature combinations in the feature space through CatBoost and RFE, and then integrated the training results under these feature combinations through voting to obtain integrated click fraud detection results.The framework adopted the same base classifier and ensemble learning method, which not only overcame the problem of unsatisfactory integrated results due to the mutual constraints of different classifiers, but also overcame the tendency of RFE to fall into a local optimal solution when selecting features, so that it had better detection ability.The performance evaluation and comparative experimental results on the actual Internet click fraud dataset show that the click fraud detection ability of the CAT-RFE ensemble learning framework exceeds that of the CatBoost method, the combined method of CatBoost and RFE, and other machine learning methods, proving that the framework has good competitiveness.The proposed framework provides a feasible solution for Internet advertising click fraud detection.

Key words: click fraud detection, CatBoost, recursive feature elimination, ensemble learning

中图分类号: 

No Suggested Reading articles found!