Big Data Research ›› 2024, Vol. 10 ›› Issue (1): 185-194.doi: 10.11959/j.issn.2096-0271.2024019

• BIG DATA DOMAIN APPLICATION • Previous Articles    

Data expansion method for genetic engineering of special materials with small sample data

Tao YANG1, Zhaobo ZHANG2, Tianyi ZHENG3, Bao PENG3,4   

  1. 1 Shenzhen Koron Soft Co., Ltd., Shenzhen 518063, China
    2 GD Holdings Pearl River Delta Water Supply Co., Ltd., Guangzhou 511455, China
    3 South China Academy of Advanced Optoelectronics, South China Normal University, Guangzhou 510006, China
    4 School of Information and Communication, Shenzhen Institute of Information Technology, Shenzhen 518172, China
  • Online:2024-01-01 Published:2024-01-01
  • Supported by:
    Shenzhen University Stability Support Plan(20200829114939001);Project of Shenzhen Institute of Information Technology School-level Innovative, Scientific Research Team(TD2020E001);The Pearl River Delta Water Resources Allocation Engineering Scientific Research Project(CD88-QT01-2022-0068)

Abstract:

With the increasing diversity and complexity of material requirements for underground water conservancy and water pipeline networks, the efficient and convenient design of special materials to meet individual needs through machine learning has become a hot topic of concern.Traditional supervised learning methods are all based on a large dataset to train models, but obtaining large datasets for special materials required in deeply buried underground water pipeline networks and high-end military equipment, such as rare and high-entropy alloys, etc.requires extremely high cost and a long period.To solve this problem, we propose a small sample expansion model-RX-SMOGN, using XGBoost and RFECV algorithms for feature screening.We enrich the dataset with the SMOGN algorithm.In this paper, the phase structure of high-entropy alloys is used as the research object, and traditional machine learning models are trained to predict them to verify the effectiveness of the RX-SMOGN model.From the results of 5-fold cross-verification and 4 evaluation indicators, it can be seen that the RX-SMOGN model fully improves the performance of the machine learning model, provides a more convenient method for alloy material design, and fully improves the efficiency of alloy material design.

Key words: small sample expansion, feature engineering, machine learning, high-entropy alloy, rare precious metal

CLC Number: 

No Suggested Reading articles found!