Journal on Communications ›› 2022, Vol. 43 ›› Issue (3): 211-224.doi: 10.11959/j.issn.1000-436x.2022057

• Correspondences • Previous Articles     Next Articles

Generate medical synthetic data based on generative adversarial network

Xiayu XIANG1, Jiahui WANG2, Zirui WANG3, Shaoming DUAN3, Hezhong PAN1, Rongfei ZHUANG3, Peiyi HAN3,4, Chuanyi LIU3,4   

  1. 1 School of Cyberspace Security, Beijing University of Posts and Telecommunications, Beijing 100876, China
    2 Department of Information and Security, The State Information Center, Beijing 100045, China
    3 School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Shenzhen 518055, China
    4 Cyberspace Security Research Center, Peng Cheng Laboratory, Shenzhen 518066, China
  • Revised:2022-02-17 Online:2022-03-25 Published:2022-03-01
  • Supported by:
    The National Key Research and Development Program of China(2016YFB0800803);The National Key Research and Development Program of China(2018YFB1004005);The National Natural Science Foundation of China(61872110)

Abstract:

Modeling the probability distribution of rows in structured electronic health records and generating realistic synthetic data is a non-trivial task.Tabular data usually contains discrete columns, and traditional encoding approaches may suffer from the curse of feature dimensionality.Poincaré Ball model was utilized to model the hierarchical structure of nominal variables and Gaussian copula-based generative adversarial network was employed to provide synthetic structured electronic health records.The generated training data are experimentally tested to achieve only 2% difference in utility from the original data yet ensure privacy.

Key words: generative adversarial network, representation learning, privacy-utility analysis, electronic health record

CLC Number: 

No Suggested Reading articles found!