Chinese Journal of Network and Information Security ›› 2022, Vol. 8 ›› Issue (5): 158-166.doi: 10.11959/j.issn.2096-109x.2022065

• Papers • Previous Articles     Next Articles

CAT-RFE: ensemble detection framework for click fraud

Yixiang LU1, Guanggang GENG1, Zhiwei YAN2, Xiaomin ZHU3, Xinchang ZHANG4   

  1. 1 College of Cyber Security, Jinan University, Guangzhou 510632, China
    2 China Internet Network Information Center, Beijing 100190, China
    3 Shandong Institute of Big Data, Jinan 250001, China
    4 Shandong Academy of Sciences, Jinan 250001, China
  • Revised:2022-01-05 Online:2022-10-15 Published:2022-10-01
  • Supported by:
    The NationalNatural Science Foundation of China(92067108);The Natural Science Foundation of Guangdong Province(2021A1515011314)

Abstract:

Click fraud is one of the most common methods of cybercrime in recent years, and the Internet advertising industry suffers huge losses every year because of click fraud.In order to effectively detect fraudulent clicks within massive clicks, a variety of features that fully combine the relationship between advertising clicks and time attributes were constructed.Besides, an ensemble learning framework for click fraud detection was proposed, namely CAT-RFE ensemble learning framework.The CAT-RFE ensemble learning framework consisted of three parts: base classifier, recursive feature elimination (RFE) and voting ensemble learning.Among them, the gradient boosting model suitable for category features-CatBoost was used as the base classifier.RFE was a feature selection method based on greedy strategy, which can select a better feature combination from multiple sets of features.Voting ensemble learning was a learning method that combined the results of multiple base classifiers by voting.The framework obtained multiple sets of optimal feature combinations in the feature space through CatBoost and RFE, and then integrated the training results under these feature combinations through voting to obtain integrated click fraud detection results.The framework adopted the same base classifier and ensemble learning method, which not only overcame the problem of unsatisfactory integrated results due to the mutual constraints of different classifiers, but also overcame the tendency of RFE to fall into a local optimal solution when selecting features, so that it had better detection ability.The performance evaluation and comparative experimental results on the actual Internet click fraud dataset show that the click fraud detection ability of the CAT-RFE ensemble learning framework exceeds that of the CatBoost method, the combined method of CatBoost and RFE, and other machine learning methods, proving that the framework has good competitiveness.The proposed framework provides a feasible solution for Internet advertising click fraud detection.

Key words: click fraud detection, CatBoost, recursive feature elimination, ensemble learning

CLC Number: 

No Suggested Reading articles found!