大数据 ›› 2024, Vol. 10 ›› Issue (1): 185-194.doi: 10.11959/j.issn.2096-0271.2024019

• 专栏:大数据领域应用 • 上一篇    

小样本数据下特种材料基因工程的数据扩充方法

杨涛1, 张兆波2, 郑添屹3, 彭保3,4   

  1. 1 深圳市科荣软件股份有限公司,广东 深圳 518063
    2 广东粤海珠三角供水有限公司,广东 广州 511455
    3 华南师范大学华南先进光电子研究院,广东 广州 510006
    4 深圳信息职业技术学院信息与通信学院,广东 深圳 518172
  • 出版日期:2024-01-01 发布日期:2024-01-01
  • 作者简介:杨涛(1984- ),男,深圳市科荣软件股份有限公司工程师、董事长,主要研究方向为水务信息化、人工智能技术。
    张兆波(1984- ),男,广东粤海珠三角供水有限公司高级工程师、经理,主要研究方向为智能水务、人工智能技术。
    郑添屹(1998- ),男,华南师范大学华南先进光电子研究院硕士生,主要研究方向为工业大数据、材料基因技术。
    彭保(1979- ),男,博士,深圳信息职业技术学院教授、研究员,主要研究方向为工业大数据、材料基因技术。
  • 基金资助:
    深圳大学稳定保障计划项目(20200829114939001);深圳信息职业技术学院校级创新科研团队项目(TD2020E001);珠三角水资源配置工程科研项目(CD88-QT01-2022-0068)

Data expansion method for genetic engineering of special materials with small sample data

Tao YANG1, Zhaobo ZHANG2, Tianyi ZHENG3, Bao PENG3,4   

  1. 1 Shenzhen Koron Soft Co., Ltd., Shenzhen 518063, China
    2 GD Holdings Pearl River Delta Water Supply Co., Ltd., Guangzhou 511455, China
    3 South China Academy of Advanced Optoelectronics, South China Normal University, Guangzhou 510006, China
    4 School of Information and Communication, Shenzhen Institute of Information Technology, Shenzhen 518172, China
  • Online:2024-01-01 Published:2024-01-01
  • Supported by:
    Shenzhen University Stability Support Plan(20200829114939001);Project of Shenzhen Institute of Information Technology School-level Innovative, Scientific Research Team(TD2020E001);The Pearl River Delta Water Resources Allocation Engineering Scientific Research Project(CD88-QT01-2022-0068)

摘要:

随着地下水利、水务管网对材料需求的多样性和复杂性日益加剧,通过机器学习高效便捷地设计满足个性化需求的特种材料成为人们关注的热点。传统监督学习方法均以大量数据训练建模为基础,但从深埋地下水务管网、高端军工设备等领域所需的特种材料,如稀贵高熵合金等获取大数据集,需要的成本极高且周期较长。为了解决该问题,提出了一种小样本扩充模型——RX-SMOGN,使用极致梯度提升模型和使用交叉验证的递归特征消除算法进行特征筛选,使用SMOGN算法扩充数据集。提出以高熵合金相结构为研究对象,训练传统机器学习模型对其进行预测以验证RX-SMOGN模型的有效性。由五折交叉验证及4个评价指标结果可知,RX-SMOGN模型充分提高了机器学习模型的性能,为合金材料设计提供了一种更便捷的方法,充分提高了合金材料设计的效率。

关键词: 小样本扩充, 特征工程, 机器学习, 高熵合金, 稀贵金属

Abstract:

With the increasing diversity and complexity of material requirements for underground water conservancy and water pipeline networks, the efficient and convenient design of special materials to meet individual needs through machine learning has become a hot topic of concern.Traditional supervised learning methods are all based on a large dataset to train models, but obtaining large datasets for special materials required in deeply buried underground water pipeline networks and high-end military equipment, such as rare and high-entropy alloys, etc.requires extremely high cost and a long period.To solve this problem, we propose a small sample expansion model-RX-SMOGN, using XGBoost and RFECV algorithms for feature screening.We enrich the dataset with the SMOGN algorithm.In this paper, the phase structure of high-entropy alloys is used as the research object, and traditional machine learning models are trained to predict them to verify the effectiveness of the RX-SMOGN model.From the results of 5-fold cross-verification and 4 evaluation indicators, it can be seen that the RX-SMOGN model fully improves the performance of the machine learning model, provides a more convenient method for alloy material design, and fully improves the efficiency of alloy material design.

Key words: small sample expansion, feature engineering, machine learning, high-entropy alloy, rare precious metal

中图分类号: 

No Suggested Reading articles found!