CAT-RFE：点击欺诈的集成检测框架

doi:10.11959/j.issn.2096-109x.2022065

Abstract

Abstract:

Click fraud is one of the most common methods of cybercrime in recent years, and the Internet advertising industry suffers huge losses every year because of click fraud.In order to effectively detect fraudulent clicks within massive clicks, a variety of features that fully combine the relationship between advertising clicks and time attributes were constructed.Besides, an ensemble learning framework for click fraud detection was proposed, namely CAT-RFE ensemble learning framework.The CAT-RFE ensemble learning framework consisted of three parts: base classifier, recursive feature elimination (RFE) and voting ensemble learning.Among them, the gradient boosting model suitable for category features-CatBoost was used as the base classifier.RFE was a feature selection method based on greedy strategy, which can select a better feature combination from multiple sets of features.Voting ensemble learning was a learning method that combined the results of multiple base classifiers by voting.The framework obtained multiple sets of optimal feature combinations in the feature space through CatBoost and RFE, and then integrated the training results under these feature combinations through voting to obtain integrated click fraud detection results.The framework adopted the same base classifier and ensemble learning method, which not only overcame the problem of unsatisfactory integrated results due to the mutual constraints of different classifiers, but also overcame the tendency of RFE to fall into a local optimal solution when selecting features, so that it had better detection ability.The performance evaluation and comparative experimental results on the actual Internet click fraud dataset show that the click fraud detection ability of the CAT-RFE ensemble learning framework exceeds that of the CatBoost method, the combined method of CatBoost and RFE, and other machine learning methods, proving that the framework has good competitiveness.The proposed framework provides a feasible solution for Internet advertising click fraud detection.

Key words: click fraud detection, CatBoost, recursive feature elimination, ensemble learning

CLC Number:

TP393

Yixiang LU, Guanggang GENG, Zhiwei YAN, Xiaomin ZHU, Xinchang ZHANG. CAT-RFE: ensemble detection framework for click fraud[J]. Chinese Journal of Network and Information Security, 2022, 8(5): 158-166.

Figures/Tables 8

References 20

[1]	BORGI M , DESSAI P , MALIK V ,et al. Advertisement click fraud detection system:a survey[J]. International Journal of Engineering Research ＆ Technology (IJERT), 2021,10(5): 553-560.
[2]	GOHIL N , MENIYA A D . A survey on online advertising and click fraud detection[J]. Nayanaba Gohil Department of Information Technology Shantilal Shah Engineering, 2020.
[3]	DOROGUSH A V , ERSHOV V , GULIN A . CatBoost:gradient boosting with categorical features support[J]. arXiv preprint arXiv:1810.11363, 2018.
[4]	NAGARAJA S , SHAH R . Clicktok:click fraud detection using traffic analysis[C]// Proceedings of the 12th Conference on Security and Privacy in Wireless and Mobile Networks. 2019: 105-116.
[5]	GABRYEL M , . Data analysis algorithm for click fraud recognition[C]// International Conference on Information and Software Technologies. 2018: 437-446.
[6]	MOUAWI R , AWAD M , CHEHAB A ,et al. Towards a machine learning approach for detecting click fraud in mobile advertizing[C]// 2018 International Conference on Innovations in Information Technology (IIT). 2018: 88-92.
[7]	董亚楠, 刘学军, 李斌 . 一种基于用户行为特征选择的点击欺诈检测方法[J]. 计算机科学, 2016,43(10): 145-149.
	DONG Y , LIU X , LI B . Click fraud detection method based on user behavior feature selection[J]. Computer Science, 2016,43(10): 145-149.
[8]	TANEJA M , GARG K , PURWAR A ,et al. Prediction of click frauds in mobile advertising[C]// 2015 Eighth International Conference on Contemporary Computing (IC3). 2015: 162-166.
[9]	张欣, 刘学军, 李斌 ,等. 一种网络广告点击欺诈检测的 SVM集成方法[J]. 小型微型计算机系统, 2018,39(5): 951-956.
	ZHANG X , LIU X J , LI B ,et al. Application of SVM ensemble method to click fraud detection[J]. Journal of Chinese Computer Systems, 2018,39(5): 951-956.
[10]	BERRAR D , . Random forests for the detection of click fraud in online mobile advertising[C]// Proceedings of the 1st International Workshop on Fraud Detection in Mobile Advertising. 2012: 1-10.
[11]	SHAOHUI D , QIU G W , MAI H ,et al. Customer transaction fraud detection using random forest[C]// 2021 IEEE International Conference on Consumer Electronics and Computer Engineering (ICCECE). 2021: 144-147.
[12]	PERERA K S , NEUPANE B , FAISAL M A ,et al. A novel ensemble learning-based approach for click fraud detection in mobile advertising[M]// Mining Intelligence and Knowledge Exploration, 2013: 370-382.
[13]	GOHIL N P , MENIYA A D . Click Ad fraud detection using xgboost gradient boosting algorithm[C]// International Conference on Computing Science,Communication and Security. 2021: 67-81.
[14]	VIRUTHIKA B , DAS S S , KUMAR E M ,et al. Detection of advertisement click fraud using machine learning[J]. International Journal of Advanced Science and Technology, 2020,29(5): 3238-3245.
[15]	MINASTIREANU E A , MESNITA G . Light GBM machine learning algorithm to online click fraud detection[J]. J Inform Assur Cybersecur, 2019,(2019): 263928.
[16]	ZHANG Y , TONG J , WANG Z ,et al. Customer transaction fraud detection using Xgboost model[C]// 2020 International Conference on Computer Engineering and Application (ICCEA). 2020: 554-558.
[17]	THEJAS G S , DHEESHJITH S , IYENGAR S S ,et al. A hybrid and effective learning approach for Click Fraud detection[J]. Machine Learning with Applications, 2021,3:100016.
[18]	HADDADI H . Fighting online click-fraud using bluff ads[J]. ACM SIGCOMM Computer Communication Review, 2010,40(2): 21-25.
[19]	KE G , MENG Q , FINLEY T ,et al. Lightgbm:a highly efficient gradient boosting decision tree[J]. Advances in neural information processing systems, 2017,30: 3146-3154.
[20]	CHEN T , HE T , BENESTY M ,et al. Xgboost:extreme gradient boosting[J]. R Package Version 0.4-2, 2015,1(4): 1-4.

Metrics

Recommended 0

No Suggested Reading articles found!

属性名称	说明	属性名称	说明
android_id	对外广告位ID	dev_height	设备高
media_id	对外媒体ID	dev_width	设备宽
apptype	app所属分类	dev_ppi	设备屏幕分辨率
package	包名	lan	设备采用的语言
version	app版本号	location	用户地理位置编码
ntt	网络类型	fea_hash	用户特征编码
carrier	设备使用的运营商	fea1_hash	用户特征编码
os	操作系统	cus_type	用户特征编码
osv	操作系统版本	timestamp	请求到达服务时间

属性名称	新特征名称
android_id	(1) android_id_count_filter_20 (2) android_id_count_bi_1
media_id	(3) media_id_count_bi_1000
package	(4) package_count_filter_20 (5) package_count_bi_150
version	(6) version_count_bi_13000
osv	(7) osv_count_bi_2000
dev_height	(8) dev_height_count_bi_360
dev_width	(9) dev_width_count_bi_850
fea_hash	(10) fea_hash_count_filter_20 (11) fea_hash_count_bi_1
fea1_hash	(12) fea1_hash_count_filter_20 (13) fea1_hash_count_bi_60
timestamp	(14)～(19) time_minute_X(X=5,10,30,60,120,360)
	(20)～(24) time_minute_count_X(X=1,2,5,10,30)

分类器名称	十折交叉验证准确率
K近邻	62.244 2%
逻辑回归	69.255 0%
决策树	82.652 4%
随机森林	88.480 4%
LightGBM	88.274 0%
xgboost	88.811 6%
CatBoost	89.322 0%

序号	特征个数	十折交叉验证准确率
1	26	89.396 4%
2	37	89.391 6%
3	35	89.386 0%
4	30	89.385 2%
5	39	89.382 2%

k	实验a测试集准确率	实验b测试集准确率
2	89.362 0%	89.381 3%
3	89.368 0%	89.347 3%
4	89.368 7%	89.356 0%
5	89.374 0%	89.374 0%

CAT-RFE: ensemble detection framework for click fraud

RichHTML

PDF下载

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 8

References 20

Related Articles 1

Metrics

Recommended 0

序号	模型	特征	交叉验证准确率	测试集准确率
1	基线模型	第一类特征	88.040 0%	88.025 3%
2	CatBoost	第一类特征	89.322 0%	89.331 3%
3	CatBoost	所有特征	89.364 6%	89.358 7%
4	CatBoost+RFE	所有特征	89.396 4%	89.328 7%
5	提出的框架	所有特征	89.425 2%	89.374 0%