差异化需求下的非关系型分布式报送信息大数据分类方法

doi:10.11959/j.issn.1000-0801.2023122

电信科学 ›› 2023, Vol. 39 ›› Issue (6): 114-121.doi: 10.11959/j.issn.1000-0801.2023122

差异化需求下的非关系型分布式报送信息大数据分类方法

韩璐¹, 陈威宇¹, 张斐², 何建锋¹, 苏怀振³

¹ 国网甘肃省电力公司，甘肃兰州 730030
² 国网思极飞天（兰州）云数科技有限公司，甘肃兰州 730020
³ 国网甘肃省电力公司定西供电公司，甘肃定西 743000

修回日期:2023-06-01 出版日期:2023-06-20 发布日期:2023-06-01
作者简介:韩璐（1983- ），女，国网甘肃省电力公司高级政工师，主要研究方向为大数据质量分析、数据挖掘、数据分类管理和算法分析
陈威宇（1985- ），男，国网甘肃省电力公司经济师、政工师，主要研究方向为大数据分析、数据库构建和大数据案例分析
张斐（1992- ），男，国网思极飞天（兰州）云数科技有限公司工程师，主要研究方向为中大型软件架构、大数据算法分析和数据建模
何建锋（1988- ），男，国网甘肃省电力公司工程师，主要研究方向为大数据质量分析、数据挖掘、数据分类管理和算法分析
苏怀振（1994- ），男，国网甘肃省电力公司定西供电公司助理工程师，主要研究方向为大数据质量分析、数据挖掘、数据分类管理和算法分析

Big data classification method of non relational distributed submission information under differentiated requirements

Lu HAN¹, Weiyu CHEN¹, Fei ZHANG², Jianfeng HE¹, Huaizhen SU³

¹ State Grid Gansu Electric Power Company, Lanzhou 730030, China
² State Grid Lanzhou Siji Feitian Cloud Date Science Technology Co., Ltd., Lanzhou 730020, China
³ State Grid Gansu Electric Power Company Dingxi Power Supply Company, Dingxi 743000, China

Revised:2023-06-01 Online:2023-06-20 Published:2023-06-01

摘要/Abstract

摘要：

针对多源异构、分布广泛报送信息差异化应用需求较多、无法区分可用性信息的问题，研究了差异化需求下的非关系型分布式报送信息大数据分类方法。首先，分析了非关系型分布式报送信息数据库的可用性、开放性和拓展性等特征，结合字段类型的基本要求，采用非结构化数据库存储文本检索信息处理（TRIP）存储非关系型分布式报送信息；然后，分析了汉明散列家族内散列过程，在线性级要求约束下，利用多吸引子优化元胞自动机，通过遗传算法改进多吸引子元胞自动机分类器的最优参数，进而改进大数据分类方法。实验结果表明，该方法能够有效识别并分类非关系型分布式报送信息中的结构化数据与非结构化数据，具有较高的分类精度。

关键词: 差异化需求, 非关系型, 分布式, 报送信息, 大数据分类, 元胞自动机

Abstract:

The classification method of non-relational distributed submitted information big data under the differentiated demand was studied, aiming at the problem of multi-source heterogeneous, widely distributed submitted information with more differentiated application requirements and inability to distinguish the available information.Firstly, the usability, openness and expansibility of the non-relational distributed submission information database were analyzed.The unstructured database storage TRIP was used to store non-relational distributed submission information by combining the basic requirements of field types.Then, the hashing process within the Hamming hash family was analyzed.Under the constraint of linearity level requirements, cellular automata with multiple attractors were used to optimize the system.The optimal parameters of the multiple attractor cellular automata classifier were improved through genetic algorithm, thus improving the big data classification method.Experimental results show that this method can effectively identify and classify structured data and unstructured data in non relational distributed submission information, and has high classification accuracy.

Key words: differentiated demand, non relational, distributed, submit information, big data classification, cellular automata

中图分类号:

TP311

韩璐, 陈威宇, 张斐, 何建锋, 苏怀振. 差异化需求下的非关系型分布式报送信息大数据分类方法[J]. 电信科学, 2023, 39(6): 114-121.

Lu HAN, Weiyu CHEN, Fei ZHANG, Jianfeng HE, Huaizhen SU. Big data classification method of non relational distributed submission information under differentiated requirements[J]. Telecommunications Science, 2023, 39(6): 114-121.

图/表 7

图1

图2

图3

图4

表1

图5

图6

参考文献 15

[1]	高伟, 薛梦瑶, 于成成 . 面向大数据的情报分析方法和技术体系研究[J]. 情报理论与实践, 2019,42(12): 43-48,35.
	GAO W , XUE M Y , YU C C . Big data-oriented system of intelligence analysis methods and technologies[J]. Information Studies (Theory ＆ Application), 2019,42(12): 43-48,35.
[2]	张淑清 . 基于哈希计算的大数据冗余消除算法设计[J]. 微型电脑应用, 2021,37(12): 68-70.
	ZHANG S Q . Design of big data redundancy elimination algorithm based on hash function[J]. Microcomputer Applications, 2021,37(12): 68-70.
[3]	李越颖 . 基于邻域搜索的在线特征大数据分类方法[J]. 微电子学与计算机, 2021,38(9): 61-66.
	LI Y Y . Big data classification method of neighborhood search for online feature selection[J]. Microelectronics ＆ Computer, 2021,38(9): 61-66.
[4]	臧艳辉, 赵雪章, 席运江 . Spark框架下利用分布式NBC的大数据文本分类方法[J]. 计算机应用研究, 2019,36(12): 3705-3708,3712.
	ZANG Y H , ZHAO X Z , XI Y J . Text classification of big data using distributed NBC under Spark framework[J]. Application Research of Computers, 2019,36(12): 3705-3708,3712.
[5]	刘孝保, 陆宏彪, 阴艳超 ,等. 基于多元神经网络融合的分布式资源空间文本分类研究[J]. 计算机集成制造系统, 2020,26(1): 161-170.
	LIU X B , LU H B , YIN Y C ,et al. Distributed resource spatial text classification based on multivariate neural network fusion[J]. Computer Integrated Manufacturing Systems, 2020,26(1): 161-170.
[6]	曹瑜, 王楠, 徐志超 . Spark 框架结合分布式 KNN 分类器的网络大数据分类处理方法[J]. 计算机应用研究, 2019,36(11): 3274-3277,3333.
	CAO Y , WANG N , XU Z C . Network big data classification processing method based on Spark and distributed KNN classifier[J]. Application Research of Computers, 2019,36(11): 3274-3277,3333.
[7]	SATHYARAJ R , RAMANATHAN L , LAVANYA K ,et al. Chicken swarm foraging algorithm for big data classification using the deep belief network classifier[J]. Data Technologies and Applications, 2020: 332-352.
[8]	WANG Y J , CHENG S M , ZHANG X C ,et al. Block storage optimization and parallel data processing and analysis of product big data based on the hadoop platform[J]. Mathematical Problems in Engineering, 2021: 1-14.
[9]	尹春勇, 张帼杰 . 面向分布式漂移数据流的集成分类模型[J]. 计算机应用, 2021,41(7): 1947-1955.
	YIN C Y , ZHANG G J . Ensemble classification model for distributed drifted data streams[J]. Journal of Computer Applications, 2021,41(7): 1947-1955.
[10]	刘恒 . 基于元胞自动机的分布式洪水预报研究[J]. 人民黄河, 2020,42(8): 49-55.
	LIU H . Study of distributed flood forecasting based on cellular automata[J]. Yellow River, 2020,42(8): 49-55.
[11]	屠要峰, 陈正华, 韩银俊 ,等. 基于持久性内存和SSD的后端存储MixStore[J]. 计算机研究与发展, 2021,58(2): 406-417.
	TU Y F , CHEN Z H , HAN Y J ,et al. MixStore:back-end storage based on persistent memory and SSD[J]. Journal of Computer Research and Development, 2021,58(2): 406-417.
[12]	钱玲飞, 崔晓蕾 . 基于数据增强的领域知识图谱构建方法研究[J]. 现代情报, 2022,42(3): 31-39.
	QIAN L F , CUI X L . Research on construction method of domain knowledge graph based on transfer learning[J]. Journal of Modern Information, 2022,42(3): 31-39.
[13]	黄丽莲, 姚文举, 项建弘 ,等. 一种具有多对称同质吸引子的四维混沌系统的超级多稳定性研究[J]. 电子与信息学报, 2022,44(1): 390-399.
	HUANG L L , YAO W J , XIANG J H ,et al. Extreme multi-stability of a four-dimensional chaotic system with infinitely many symmetric homogeneous attractors[J]. Journal of Electronics ＆ Information Technology, 2022,44(1): 390-399.
[14]	陈维兴, 苏景芳, 孟美含 . 元胞机制下机坪机会网络缓存控制策略[J]. 江苏大学学报（自然科学版）, 2022,43(1): 75-82.
	CHEN W X , SU J F , MENG M H . Control strategy of apron opportunity network cache under cell evolution rule[J]. Journal of Jiangsu University:Natural Science Edition, 2022,43(1): 75-82.
[15]	张士杰, 王颖明, 王琦 ,等. 基于元胞自动机-格子玻尔兹曼模型的枝晶碰撞行为模拟[J]. 物理学报, 2021,70(23): 335-343.
	ZHANG S J , WANG Y M , WANG Q ,et al. Simulation of dendrite collision behavior based on cellular automata-lattice Boltzmann model[J]. Acta Physica Sinica, 2021,70(23): 335-343.

参数优化前
报送信息量/byte	5 000		10 000
输入层与隐藏层比特维数比值	100:5	300:10	100:5	300:10
分类精度	86.81%	87.24%	85.93%	86.10%
内存消耗/byte	279	608	279	608
参数优化后
报送信息量/byte	5 000		10 000
输入层与隐藏层比特维数比值	100:5	300:10	100:5	300:10
分类精度	96.73%	95.56%	96.01%	96.22%
内存消耗/byte	186	391	186	391

差异化需求下的非关系型分布式报送信息大数据分类方法

Big data classification method of non relational distributed submission information under differentiated requirements

在线阅读

PDF下载

可视化

摘要/Abstract

引用本文

使用本文

图/表 7

参考文献 15

相关文章 15

Metrics

推荐阅读 0

[1]	郭泽华, 朱昊文, 徐同文. 面向分布式机器学习的网络模态创新[J]. 电信科学, 2023, 39(6): 44-51.
[2]	陈楠, 赵建军, 钟平, 黄勇军, 陈天. 基于云原生的分布式物联网操作系统架构研究[J]. 电信科学, 2022, 38(7): 146-156.
[3]	李琴, 李唯源, 孙晓文, 胡玉双, 孙滔. 6G网络智能内生的思考[J]. 电信科学, 2021, 37(9): 20-29.
[4]	刘飞扬, 李坤, 宋飞, 周华春. DDoS攻击恶意行为知识库构建[J]. 电信科学, 2021, 37(11): 17-32.
[5]	余云河, 孙君. 机器类通信中集中式与分布式Q学习的资源分配算法研究[J]. 电信科学, 2021, 37(11): 41-50.
[6]	李舒婷,刘金科,陈娜. 基于区块链技术的计费清结算平台的设计与研究[J]. 电信科学, 2020, 36(9): 84-93.
[7]	许丹丹,张云勇,张道琳,张第,王笑,蔡一欣. 5G时代区块链发展趋势及应用分析[J]. 电信科学, 2020, 36(3): 117-124.
[8]	马大燕,谢祥颖,那峙雄,沈文涛,孟凡腾. 基于调度自动化系统的低压分布式光伏电站接入估算模型[J]. 电信科学, 2020, 36(2): 90-94.
[9]	谢萍,刘孝颂. 基于区块链的SDN物联网部署安全[J]. 电信科学, 2020, 36(12): 139-146.
[10]	张帅,潘鹏,王璀. 分布式阵列系统中反馈比特的分配方法[J]. 电信科学, 2020, 36(11): 79-88.
[11]	聂凯君,曹傧,彭木根. 6G内生安全：区块链技术[J]. 电信科学, 2020, 36(1): 21-27.
[12]	孙嘉琪, 杨广铭, 党娟娜, 刘文杰. 温敏网络的关键能力和架构体系[J]. 电信科学, 2019, 35(9): 52-57.
[13]	张红,沈士根,吴小军,曹奇英. 基于元胞自动机和静态贝叶斯博弈的WSN恶意程序传染模型[J]. 电信科学, 2019, 35(6): 60-69.
[14]	任昊文,杨雅琪. 区块链分布式技术在电力需求侧响应管理中的应用[J]. 电信科学, 2019, 35(5): 155-160.
[15]	康凯凯,刘兆霆,姚英彪. 传感器网络分布式顽健RLS估计算法[J]. 电信科学, 2019, 35(1): 37-43.