Telecommunications Science ›› 2020, Vol. 36 ›› Issue (3): 83-94.doi: 10.11959/j.issn.1000-0801.2020061
• Research and Development • Previous Articles Next Articles
Yang SUN,Li SU,Xing ZHANG,Fengsheng WANG,Haitao DU
Revised:
2020-03-06
Online:
2020-03-20
Published:
2020-03-26
Supported by:
CLC Number:
Yang SUN,Li SU,Xing ZHANG,Fengsheng WANG,Haitao DU. Method of short text strategy mining based on sub-semantic space[J]. Telecommunications Science, 2020, 36(3): 83-94.
"
类别 | 词汇 |
推广类 | 好礼|豪礼、大放送、赔钱|亏本|贱价、火热|火爆、喜悦、赔钱|贱卖|赔本|亏本、相思|思念、传达|传送|传递、便民、主流、粉丝 |
欺诈类 | 房租|租金|房费、卡上|卡里|卡内|账户|账上|账号、房东、幸运、评选|选中|抽选|抽中|中选、旺肖|平肖|特肖、给料|发料、办证、刻章、发票、积分 |
违法类 | 遇难、法轮、谎言、机缘、彷徨、劫难|灾难、佛法、罪恶|罪行、镇压、功学、保佑、中共|共产党|中央 |
涉黑类 | 流连忘返|乐不思蜀、排列、输钱、百家乐|六合采|六合彩|赌球|打麻将、随时随地|足不出户|不出门、赌博|博弈、间断、点子|运到|运气、先知、赌王、分金、储直|充值、http|www.|.com|.c0m|.cn|.cc|.hk|.net|.apk|m.|.lt |
"
短信ID | 策略词表示 | 特征ID表示 | 支持度 |
M1 | 百家乐|六合采|六合彩|赌球|打麻将,赌博|博弈,http|www.|.com|.c0m|.cn|.cc|.hk|.net| .apk|m.|.lt,随时随地|足不出户|不出门 | T1,T2,T3,T4 | 40 |
M2 | 百家乐|六合采|六合彩|赌球|打麻将,分金,排列 | T1,T5,T6 | 10 |
M3 | 百家乐|六合采|六合彩|赌球|打麻将,http|www.|.com|.c0m|.cn|.cc|.hk|.net|.apk|m.|.lt,分金,排列 | T1,T3,T5,T6 | 10 |
M4 | 赌博|博弈,输钱,先知,随时随地|足不出户|不出门 | T2,T7,T8,T9 | 10 |
M5 | 赌博|博弈,http|www.|.com|.c0m|.cn|.cc|.hk|.net|.apk|m.|.lt ,输钱,先知 | T2,T3,T7,T8 | 10 |
"
ITEM_ID | 模式基 | 条件FP_TREE | 关联规则 |
T9 | {(T2 T7 T8:10)} | <T2:10 ,T7:10 ,T8:10> | T2T9:10,T7T9:10,T8T9:10,T2T7T9:10,T2T8T9:10,T2T7T8 T9:10 |
T8 | { (T2 T7:10) | <T2:10,T7:10> | T2T8:20,T7T8:10,T2T7T8:20,T3T8:10,T2T3T7T8:10 |
(T2 T3 T7:10)} | <T2:10,T3:10,T7:10> | ||
T7 | {(T2:20)} | < T2:20 > | T2T7:20 |
T6 | {(T1 T2 T5:60) | <T1:60,T2:60 ,T5:60> | T1T6:70,T2T6:70,T5T6:70,T1T2T6:70,T1T5T6:70,T2T5T6:70,T1T2T5T6:70,T1T2T3T6:10,T2T3T5:10,T1T3T5:10,T1T2T3T5T6:10 |
(T1 T2 T3 T5:10)} | <T1:10,T2:10,T3:10,T5:10> | ||
T5 | {(T1 T2:10) | <T1:20,T2:20,T3:10> | T1T5:20,T2T5:20,T3T5:10,T1T2T5:20,T1T3T5:10,T2T3T5:10,T1T2T3T5:10 |
(T1 T2 T3:10)} | |||
T4 | {(T1 T2 T3:40)} | <T1:40,T2:40,T3:40> | T1T4:40,T2T4:40,T3T4:40,T1T2T4:40,T1T3T4:40,T2T3T4:40,T1T2T3T4:40 |
T3 | {(T1 T2:40) | <T1:50,T2:50><T2:20> | T1T3:50,T2T3:70,T1T2T3:50 |
(T1 T2:10)(T2:20)} | |||
T2 | {(T1:60)} | <T1:60> | T2T1:60 |
"
类别 | 欺诈 | 涉黄 | 涉黑 | 违法 | 恶意URL | 推广 | 正常 | 均值 |
训练数据 | 60 000 | 35 000 | 45 000 | 40 000 | 50 000 | 50 000 | 280 000 | |
测试数据 | 5 000 | 4 000 | 4 000 | 5 000 | 5 000 | 5 000 | 26 800 | |
正确数据 | 4 328 | 3 598 | 3 079 | 4 275 | 3 243 | 4 309 | 26 509 | |
误判数据 | 443 | 458 | 125 | 627 | 367 | 2032 | 4 973 | |
准确率 | 0.907 | 0.887 | 0.961 | 0.872 | 0.898 | 0.680 | 0.842 | 0.864 |
召回率 | 0.866 | 0.900 | 0.770 | 0.855 | 0.649 | 0.862 | 0.989 | 0.841 |
误判中的正常类数据量 | 330 | 26 | 42 | 243 | 300 | 1 750 | ||
误判率 | 0.069 | 0.006 | 0.013 | 0.050 | 0.083 | 0.276 | 0.083 |
"
类别 | 欺诈 | 涉黄 | 涉黑 | 违法 | 恶意URL | 推广 | 正常 | 均值 |
训练数据 | 60 000 | 35 000 | 45 000 | 40 000 | 50 000 | 50 000 | 280 000 | |
开放测试数据 | 5 000 | 4 000 | 4 000 | 5 000 | 5 000 | 5 000 | 26 800 | |
正确 | 4 128 | 3 729 | 2 902 | 4 345 | 3 446 | 4 209 | 24 935 | |
错误 | 265 | 429 | 139 | 593 | 207 | 1 132 | 6 082 | |
判定为该类 | 4 393 | 4 158 | 3 041 | 4 938 | 3 653 | 5 341 | 31 017 | |
准确率 | 0.940 | 0.897 | 0.954 | 0.880 | 0.943 | 0.788 | 0.804 | 0.887 |
召回率 | 0.826 | 0.932 | 0.726 | 0.869 | 0.689 | 0.842 | 0.930 | 0.831 |
错误中的正确数量 | 189 | 32 | 78 | 329 | 287 | 950 | ||
误判率 | 0.043 | 0.008 | 0.026 | 0.066 | 0.079 | 0.178 | 0.066 |
[1] | YIH W , GOODMAN J , CARVALHO V R . Finding advertising keywords on Web pages[C]// Proceedings of the 15th International Conference on World Wide Web. New York:ACM Press, 2006: 213-222. |
[2] | KUHN R , DE MORI R . A cache-based natural language model for speech recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1990,12(6): 570-583. |
[3] | MIHALCEA R , TARAU P . TextRank:bringing order into texts[Z]. 2004. |
[4] | 王庆, 陈泽亚, 郭静 ,等. 基于词共现矩阵的项目关键词词库和关键词语义网络[J]. 计算机应用, 2015,35(6): 1649-1653. |
WANG Q , CHEN Z Y , GUO J ,et al. Project keyword lexicon and keyword semantic network based on word co-occurrence matrix[J]. Journal of Computer Applications, 2015,35(6): 1649-1653. | |
[5] | 董振东, 董强, 郝长伶 . 知网的理论发现[J]. 中文信息学报, 2007,21(4): 3-9. |
DONG Z D , DONG Q , HAO C L . Theoretical findings of HowNet[J]. Journal of Chinese Information Processing, 2007,21(4): 3-9. | |
[6] | BENGIO Y , DUCHARME R , VINCENT P ,et al. A neural probabilistic language model[J]. Journal of Machine Learning Research, 2003(3): 1137-1155. |
[7] | MIKOLOV T , CHEN K , CORRADO G ,et al. Efficient estimation of word representations in vector space[J]. Computer Science, 2013. |
[8] | BENGIO Y , DUCHARME R , WINCENT P . Neural probabilistic language model neural probabilistic language model[Z]. 2003. |
[9] | HANM J W , KAMBER M , PEI J . 数据挖掘概念与技术[M]. 范明,孟小峰,译 .北京: 机械工业出版社, 2012: 157-179. |
HANM J W , KAMBER M , PEI J . Data mining concepts and techniques[M]. Translated by FAN M,MENG X F, Beijing: Machinery Industry PressPress, 2012: 157-179. | |
[10] | 朱龙珠, 徐宏, 刘莉莉 . 基于深度学习的 95598 重大服务事件识别研究[J]. 电力信息与通信技术, 2018,16(11): 19-23. |
ZHU L Z , XU H , LIU L L . Research on recognition of 95598 significant service events based on deep learning[J]. Electric Power Information and Communication Technology, 2018,16(11): 19-23. | |
[11] | 陈涛, 鲁萌, 陈彦名 . 运营商大数据技术应用研究[J]. 电信科学, 2017,33(1): 130-134. |
CHEN T , LU M , CHEN Y M . Research on operators’ big data technologies and applications[J]. Telecommunications Science, 2017,33(1): 130-134. |
[1] | Haoshuang LIU, Yong ZHANG, Yingbo CAO. Substructure correlation adaptation transfer learning method based on K-means clustering [J]. Telecommunications Science, 2023, 39(3): 124-134. |
[2] | Zhimin HE, Yuzhe LIN, Yujie CHENG, Shi YAN. Downlink wireless resource allocation method of V2X based on wireless sensing assistance [J]. Telecommunications Science, 2022, 38(9): 60-70. |
[3] | Haibo ZHAO, Zhijun XIANG, Linsong XIAO. A big data framework for short-term power load forecasting using heterogenous data [J]. Telecommunications Science, 2022, 38(12): 103-111. |
[4] | Ning JIN, Qingyang WANG. Research on clustering algorithm based 6G typical usage scenarios [J]. Telecommunications Science, 2022, 38(1): 121-131. |
[5] | Xie LU, Lei XU, Manjun ZHANG. Safe hierarchical virtual network mapping method based on clustering [J]. Telecommunications Science, 2021, 37(9): 112-117. |
[6] | Zihao LIU, Xiaojun JIA, Sulan ZHANG, Zhiling XU, Jun ZHANG. Vibe++ background segmentation method combining MeanShift clustering analysis and convolutional neural network [J]. Telecommunications Science, 2021, 37(3): 133-145. |
[7] | Jihua WU, Pengyu ZHU, Zichen WU, Bin GU, Tao HONG, Bo GUO, Jing WANG, Jingyu WANG. Fault diagnosis and auto dispatchin of power communication network based on unsupervised clustering and frequent subgraph mining [J]. Telecommunications Science, 2021, 37(11): 51-63. |
[8] | Xiang LI,Yuan LI,Zifei ZHANG,Zhe YANG. A density clustering-based network performance failure big data analysis algorithm [J]. Telecommunications Science, 2020, 36(9): 51-58. |
[9] | Yajie LI,Yongli ZHAO,Shoudong LIU,Jie ZHANG. Overview of research on fiber nonlinear equalization algorithm based on artificial intelligence [J]. Telecommunications Science, 2020, 36(3): 61-70. |
[10] | Dayan MA. Early warning prediction of external force destruction in transmission lines based on automatic clustering model [J]. Telecommunications Science, 2019, 35(3): 135-139. |
[11] | Qizhu ZHONG, Xiuquan WU, Yaoman LUO. Research and application of new method based on MDT intelligent analysis of LTE radio interference [J]. Telecommunications Science, 2019, 35(10): 130-136. |
[12] | Zeheng YUAN,Runlan TIAN,Xiaofeng WANG. Pre-sorting method of complex system radar signals [J]. Telecommunications Science, 2018, 34(9): 97-104. |
[13] | Yikan CAO,Zhibin XIE,Yajun WANG,Benqi XIA. Centralized base station sleep algorithm based on non-neighbor relation clustering in dense heterogeneous networks [J]. Telecommunications Science, 2018, 34(5): 63-71. |
[14] | Ru NIE. Improved large data spectral clustering algorithm based on sampling subspace constraint [J]. Telecommunications Science, 2018, 34(11): 41-47. |
[15] | Chunqin ZANG,Lichun XIE. Network intrusion detection method based on improved FCM and rule parameter optimization in cloud environment [J]. Telecommunications Science, 2018, 34(1): 72-79. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||
|