网络与信息安全学报 ›› 2023, Vol. 9 ›› Issue (3): 73-78.doi: 10.11959/j.issn.2096-109x.2023039

• 学术论文 • 上一篇    下一篇

基于PU learning的信用卡交易安全监管研究

陈任峰, 朱鸿斌   

  1. 复旦大学计算机科学技术学院,上海 200433
  • 修回日期:2023-03-27 出版日期:2023-06-25 发布日期:2023-06-01
  • 作者简介:陈任峰(1974- ),男,浙江诸暨人,复旦大学博士生,主要研究方向为隐私计算在金融行业的应用
    朱鸿斌(1991- ),男,浙江绍兴人,博士,复旦大学副研究员,主要研究方向为金融科技与安全,边缘人工智能和信号处理
  • 基金资助:
    中国工程院战略研究与咨询项目(2023-33-14)

Research on credit card transaction security supervision based on PU learning

Renfeng CHEN, Hongbin ZHU   

  1. School of Computer Science, Fudan University, Shanghai 200433, China
  • Revised:2023-03-27 Online:2023-06-25 Published:2023-06-01
  • Supported by:
    Strategic Research and Consulting Project of the Chinese Academy of Engineering(2023-33-14)

摘要:

目前信用卡套现手段复杂多变、虚假交易形态层出不穷,在仅有账户级套现标签数据的基础上,信用卡套现管理面临着与客户交互过程难以获取其真实交易情况的业务痛点。为了探究一种精准的信用卡虚假交易监管方法,以商业银行信用卡系统的套现账户交易标签数据为研究对象,建立了基于 PU learning (positive-unlabeled learning)的信用卡单笔交易安全识别模型。所提模型在样本数据标注中引入了间谍(Spy)机制,随机抽取高可靠套现交易正样本100万笔及待标注的交易样本130万笔,借助学习器预测结果分布对难以判别的非套现交易负样本进行标注,以获取相对可靠的负样本标签120万笔。基于上述正样本及标注得到的负样本数据,构建了信用卡客户属性信息、额度使用情况及交易偏好特征等120个候选变量,通过变量重要性筛选得到入模变量近50个,利用XGBoost二分类算法进行模型开发预测。结果显示,所提模型对信用卡套现虚假交易的识别准确率为94.20%,群体稳定性指标(PSI)为0.10%,表明基于PU learning的单笔交易安全识别模型能够实现对信用卡虚假交易的有效监测。该研究改进了机器学习二分类算法在难以获取高精度样本标签数据场景下的模型判别性能,为商业银行信用卡系统交易安全监控提供了新方法。

关键词: 套现交易数据监测, 信用卡系统安全监管, 半监督学习, PUlearning

Abstract:

The complex and ever-evolving nature of credit card cash out methods and the emergence of various forms of fake transactions present challenges in obtaining accurate transaction information during customer interactions.In order to develop an accurate supervision method for detecting fake credit card transactions, a PU (positive-unlabeled learning) based security identification model for single credit card transactions was established.It was based on long-term transaction label data from cashed-up accounts in commercial banks’ credit card systems.A Spy mechanism was introduced into sample data annotation by selecting million positive samples of highly reliable cash-out transactions and 1.3 million samples of transactions to be labeled, and using a learner to predict the result distribution and label negative samples of non-cash-out transactions that were difficult to identify, resulting in 1.2 million relatively reliable negative sample labels.Based on these samples, 120 candidate variables were constructed, including credit card customer attributes, quota usage, and transaction preference characteristics.After importance screening of variables, nearly 50 candidate variables were selected.The XGBoost binary classification algorithm was used for model development and prediction.The results show that the proposed model achieve an identification accuracy of 94.20%, with a group stability index (PSI) of 0.10%, indicating that the single credit card transaction security identification model based on PU learning can effectively monitor fake transactions.This study improves the model discrimination performance of machine learning binary classification algorithm in scenarios where high-precision sample label data is difficult to obtain, providing a new method for transaction security monitoring in commercial bank credit card systems.

Key words: cash transaction data monitoring, credit card system security supervision, semi-supervised learning, PU learning

中图分类号: 

No Suggested Reading articles found!