电信科学 ›› 2023, Vol. 39 ›› Issue (6): 114-121.doi: 10.11959/j.issn.1000-0801.2023122

• 研究与开发 • 上一篇    下一篇

差异化需求下的非关系型分布式报送信息大数据分类方法

韩璐1, 陈威宇1, 张斐2, 何建锋1, 苏怀振3   

  1. 1 国网甘肃省电力公司,甘肃 兰州 730030
    2 国网思极飞天(兰州)云数科技有限公司,甘肃 兰州 730020
    3 国网甘肃省电力公司定西供电公司,甘肃 定西 743000
  • 修回日期:2023-06-01 出版日期:2023-06-20 发布日期:2023-06-01
  • 作者简介:韩璐(1983- ),女,国网甘肃省电力公司高级政工师,主要研究方向为大数据质量分析、数据挖掘、数据分类管理和算法分析
    陈威宇(1985- ),男,国网甘肃省电力公司经济师、政工师,主要研究方向为大数据分析、数据库构建和大数据案例分析
    张斐(1992- ),男,国网思极飞天(兰州)云数科技有限公司工程师,主要研究方向为中大型软件架构、大数据算法分析和数据建模
    何建锋(1988- ),男,国网甘肃省电力公司工程师,主要研究方向为大数据质量分析、数据挖掘、数据分类管理和算法分析
    苏怀振(1994- ),男,国网甘肃省电力公司定西供电公司助理工程师,主要研究方向为大数据质量分析、数据挖掘、数据分类管理和算法分析

Big data classification method of non relational distributed submission information under differentiated requirements

Lu HAN1, Weiyu CHEN1, Fei ZHANG2, Jianfeng HE1, Huaizhen SU3   

  1. 1 State Grid Gansu Electric Power Company, Lanzhou 730030, China
    2 State Grid Lanzhou Siji Feitian Cloud Date Science Technology Co., Ltd., Lanzhou 730020, China
    3 State Grid Gansu Electric Power Company Dingxi Power Supply Company, Dingxi 743000, China
  • Revised:2023-06-01 Online:2023-06-20 Published:2023-06-01

摘要:

针对多源异构、分布广泛报送信息差异化应用需求较多、无法区分可用性信息的问题,研究了差异化需求下的非关系型分布式报送信息大数据分类方法。首先,分析了非关系型分布式报送信息数据库的可用性、开放性和拓展性等特征,结合字段类型的基本要求,采用非结构化数据库存储文本检索信息处理(TRIP)存储非关系型分布式报送信息;然后,分析了汉明散列家族内散列过程,在线性级要求约束下,利用多吸引子优化元胞自动机,通过遗传算法改进多吸引子元胞自动机分类器的最优参数,进而改进大数据分类方法。实验结果表明,该方法能够有效识别并分类非关系型分布式报送信息中的结构化数据与非结构化数据,具有较高的分类精度。

关键词: 差异化需求, 非关系型, 分布式, 报送信息, 大数据分类, 元胞自动机

Abstract:

The classification method of non-relational distributed submitted information big data under the differentiated demand was studied, aiming at the problem of multi-source heterogeneous, widely distributed submitted information with more differentiated application requirements and inability to distinguish the available information.Firstly, the usability, openness and expansibility of the non-relational distributed submission information database were analyzed.The unstructured database storage TRIP was used to store non-relational distributed submission information by combining the basic requirements of field types.Then, the hashing process within the Hamming hash family was analyzed.Under the constraint of linearity level requirements, cellular automata with multiple attractors were used to optimize the system.The optimal parameters of the multiple attractor cellular automata classifier were improved through genetic algorithm, thus improving the big data classification method.Experimental results show that this method can effectively identify and classify structured data and unstructured data in non relational distributed submission information, and has high classification accuracy.

Key words: differentiated demand, non relational, distributed, submit information, big data classification, cellular automata

中图分类号: 

No Suggested Reading articles found!