基于PU learning的信用卡交易安全监管研究

doi:10.11959/j.issn.2096-109x.2023039

摘要/Abstract

摘要：

目前信用卡套现手段复杂多变、虚假交易形态层出不穷，在仅有账户级套现标签数据的基础上，信用卡套现管理面临着与客户交互过程难以获取其真实交易情况的业务痛点。为了探究一种精准的信用卡虚假交易监管方法，以商业银行信用卡系统的套现账户交易标签数据为研究对象，建立了基于 PU learning （positive-unlabeled learning）的信用卡单笔交易安全识别模型。所提模型在样本数据标注中引入了间谍（Spy）机制，随机抽取高可靠套现交易正样本100万笔及待标注的交易样本130万笔，借助学习器预测结果分布对难以判别的非套现交易负样本进行标注，以获取相对可靠的负样本标签120万笔。基于上述正样本及标注得到的负样本数据，构建了信用卡客户属性信息、额度使用情况及交易偏好特征等120个候选变量，通过变量重要性筛选得到入模变量近50个，利用XGBoost二分类算法进行模型开发预测。结果显示，所提模型对信用卡套现虚假交易的识别准确率为94.20%，群体稳定性指标（PSI）为0.10%，表明基于PU learning的单笔交易安全识别模型能够实现对信用卡虚假交易的有效监测。该研究改进了机器学习二分类算法在难以获取高精度样本标签数据场景下的模型判别性能，为商业银行信用卡系统交易安全监控提供了新方法。

关键词: 套现交易数据监测, 信用卡系统安全监管, 半监督学习, PUlearning

Abstract:

The complex and ever-evolving nature of credit card cash out methods and the emergence of various forms of fake transactions present challenges in obtaining accurate transaction information during customer interactions.In order to develop an accurate supervision method for detecting fake credit card transactions, a PU (positive-unlabeled learning) based security identification model for single credit card transactions was established.It was based on long-term transaction label data from cashed-up accounts in commercial banks’ credit card systems.A Spy mechanism was introduced into sample data annotation by selecting million positive samples of highly reliable cash-out transactions and 1.3 million samples of transactions to be labeled, and using a learner to predict the result distribution and label negative samples of non-cash-out transactions that were difficult to identify, resulting in 1.2 million relatively reliable negative sample labels.Based on these samples, 120 candidate variables were constructed, including credit card customer attributes, quota usage, and transaction preference characteristics.After importance screening of variables, nearly 50 candidate variables were selected.The XGBoost binary classification algorithm was used for model development and prediction.The results show that the proposed model achieve an identification accuracy of 94.20%, with a group stability index (PSI) of 0.10%, indicating that the single credit card transaction security identification model based on PU learning can effectively monitor fake transactions.This study improves the model discrimination performance of machine learning binary classification algorithm in scenarios where high-precision sample label data is difficult to obtain, providing a new method for transaction security monitoring in commercial bank credit card systems.

Key words: cash transaction data monitoring, credit card system security supervision, semi-supervised learning, PU learning

中图分类号:

TP309

陈任峰, 朱鸿斌. 基于PU learning的信用卡交易安全监管研究[J]. 网络与信息安全学报, 2023, 9(3): 73-78.

Renfeng CHEN, Hongbin ZHU. Research on credit card transaction security supervision based on PU learning[J]. Chinese Journal of Network and Information Security, 2023, 9(3): 73-78.

图/表 5

表1

表2

表3

表4

表5

参考文献 18

[1]	李杉杉 . 新场景下商业银行信用卡套现审计思路与方法[J]. 审计观察, 2020(11): 38-43.
	LI S S . Audit ideas and methods of commercial banks' credit card cash out under the new scenario[J]. Audit Observation, 2020(11): 38-43.
[2]	林素云 . 新型信用卡套现问题及对策研究[J]. 金融科技时代, 2016,24(1): 53-56.
	LIN S Y . Research on cash-out problems and countermeasures of new credit cards[J]. Financial Technology Era, 2016,24(1): 53-56.
[3]	吕晓明 . 论我国信用卡套现的法律规制[D]. 北京:中国政法大学, 2012.
	LYU X M . On the legal Regulation of credit card cash out in China[D]. Beijng:China University of Political Science and Law, 2012.
[4]	王永超, 杨朔 . 信用卡消费资金流向监控的难点与可行路径[J]. 中国信用卡, 2018,(8): 37-40.
	WANG Y C , YANG S . Difficulties and feasible ways to monitor the flow of credit card consumption funds[J]. China Credit Card, 2018,(8): 37-40.
[5]	李丹丹 . 个人信息保护与信用卡交易安全[J]. 经济研究导刊, 2014,(20): 297-298.
	LI D D . Personal information protection and credit card transaction security[J]. Economic Research Guide, 2014,(20): 297-298.
[6]	郭从秀, 潘昱 . 信用卡套现的识别和风险控制探析[J]. 中国信用卡, 2010,(20): 49-53.
	GUO C X , PAN Y . Identification and risk control of credit card cash[J]. China Credit Card, 2010,(20): 49-53.
[7]	周文君 . 信用卡套现行为模式及其法律规制[J]. 金融法苑, 2009,(2): 98-108.
	ZHOU W J . Credit Card cash-out behavior pattern and its legal regulation[J]. Journal of Financial Law, 2009,(2): 98-108.
[8]	李峰 . 基于第三方支付平台的信用卡套现流程、特征及监管建议[J]. 上海金融, 2014,(4): 108-109.
	LI F . Credit card cashing process,characteristics and regulatory suggestions based on third-party payment platform[J]. Shanghai Finance, 2014,(4): 108-109.
[9]	SHI Z , WANG N , KONG F ,et al. A semi-supervised learning method of latent features based on convolutional neural networks for CT metal artifact reduction[J]. Medical Physics, 2022,(6): 48-50.
[10]	HSIEH C J , NATARAJAN N , DHILLON I S . PU learning for matrix completion[J]. Computer Science, 2014: 2445-2453.
[11]	REAVIS E A , FRANK S M , GREENLEE M W ,et al. Neural correlates of context-dependent feature conjunction learning in visual search tasks[J]. Human Brain Mapping, 2016,37(6): 2319-2330.
[12]	YANG P , LI X L , MEI J P ,et al. Positive-unlabeled learning for disease gene identification[J]. Bioinformatics, 2012,28(20): 2640.
[13]	LI T T , LV J , FAN W Y . Semi-supervised self-training positive and unlabeled learning based on new spy technology[J]. Journal of Computer Applications, 2019,39(10): 2822-2828.
[14]	GAO B T , ZHAI Z G , LIU B . Research on biomedical named entity recognition algorithm in PU scene[J]. Technology of IoT ＆ AI, 2019,51(1): 22-28.
[15]	ZHENG H , YU H , HAO Y ,et al. Distantly supervised named entity recognition with Spy-PU algorithm[C]// Proceedings of 2021 IEEE 2nd International Conference on Pattern Recognition and Machine Learning (PRML). 2021: 56-63.
[16]	KOLOSOV N , DALY M J , ARTOMOV M . Prioritization of disease genes from GWAS using ensemble-based positive-unlabeled learning[J]. European Journal of Human Genetics, 2021,29(10): 1527-1535.
[17]	LI H , LIU B , MUKHERJE A ,et al. Spotting fake reviews using positive-unlabeled learning[J]. Computacion Y Sistemas, 2015,18(3): 467-475.
[18]	SHAO Y H , CHEN W J , LIU L M ,et al. Laplacian unit-hyperplane learning from positive and unlabeled examples[J]. Information Sciences, 2015,314(6): 152-168.

序号	套现交易特征	正样本筛选条件
1	单一商户交易	观测月内所有交易
2	商户集中性交易	观测月内所有交易
3	高额度使用交易	观测月内所有交易
4	时间集中性交易	观测月内所有交易
5	大额异常交易	观测月内所有交易
6	还款后即刷交易	最大一笔还款5天内所有交易
7	小额多笔跳码交易	跳码当天所有交易

特征维度	变量描述	变量个数
属性信息	性别/年龄/学历/婚姻状态/…	13
额度使用	授信额度/额度使用率/…	2
交易偏好	交易金额/笔数/商户/时间/城市/…	105
总计		120

序号	变量名称	变量重要性
1	近15笔交易总额	0.078 2
2	近7笔平均交易金额	0.063 1
3	近7天最大单日交易金额	0.052 2
4	近10天平均交易金额	0.049 8
5	授信额度	0.046 6
6	近3天平均交易金额	0.043 2
7	近2天交易金额	0.039 9
8	近7天平均交易金额	0.038 1
9	近5笔交易总额	0.035 4
10	近3天最大单日交易金额	0.031 2

套现标签	抽样账户数	抽样交易数	预测准确数	预测准确率
套现交易	50	518	499	96.33%
非套现交易	50	482	443	91.91%
总计	100	1 000	942	94.20%

预测概率区间	观察点账户占比	跨时区账户占比	概率区间分布差值	概率区间分布比值	Index
[0.9, 1.0]	14.34%	15.40%	-1.06%	0.93	0.08%
[0.8, 0.9)	1.79%	1.85%	-0.06%	0.97	0.00%
[0.7, 0.8)	1.29%	1.35%	-0.06%	0.96	0.00%
[0.6, 0.7)	1.19%	1.22%	-0.03%	0.98	0.00%
[0.5, 0.6)	1.15%	1.16%	-0.01%	0.99	0.00%
[0.4, 0.5)	1.23%	1.24%	-0.01%	0.99	0.00%
[0.3, 0.4)	1.49%	1.49%	0.00%	1.00	0.00%
[0.2, 0.3)	2.09%	2.08%	0.01%	1.00	0.00%
[0.1, 0.2)	3.86%	3.80%	0.06%	1.02	0.00%
[0.0, 0.1)	71.57%	70.41%	1.16%	1.02	0.02%
PSI					0.10%