Journal on Communications ›› 2022, Vol. 43 ›› Issue (3): 211-224.doi: 10.11959/j.issn.1000-436x.2022057
• Correspondences • Previous Articles Next Articles
Xiayu XIANG1, Jiahui WANG2, Zirui WANG3, Shaoming DUAN3, Hezhong PAN1, Rongfei ZHUANG3, Peiyi HAN3,4, Chuanyi LIU3,4
Revised:
2022-02-17
Online:
2022-03-25
Published:
2022-03-01
Supported by:
CLC Number:
Xiayu XIANG, Jiahui WANG, Zirui WANG, Shaoming DUAN, Hezhong PAN, Rongfei ZHUANG, Peiyi HAN, Chuanyi LIU. Generate medical synthetic data based on generative adversarial network[J]. Journal on Communications, 2022, 43(3): 211-224.
"
数据集 | 真实训练集与仿真训练集散度值 | 真实测试集与仿真测试集散度值 | 真实训练集与真实测试集差异分数 | 真实训练集与仿真训练集差异分数 | 真实测试集与仿真测试集差异分数 | 仿真数据集差异分数 |
100万 | 0.186 | 0.189 | — | 2.181 | 2.186 | 1.356 |
75万 | 0.200 | 0.205 | — | 2.186 | 2.212 | 1.312 |
50万 | 0.203 | 0.205 | — | 2.171 | 2.189 | 1.288 |
25万 | 0.209 | 0.208 | 2.412 | 2.176 | 2.190 | 1.347 |
10万 | 0.183 | 0.177 | — | 2.188 | 2.194 | 1.344 |
5万 | 0.183 | 0.178 | — | 2.218 | 2.224 | 1.385 |
2万 | 0.240 | 0.234 | — | 2.190 | 2.194 | 1.234 |
No Embedding | 5.429 | 5.649 | 3.142 | 3.590 | 5.540 | 3.128 |
CTGAN | 4.839 | 5.069 | 3.142 | 3.300 | 5.350 | 3.584 |
"
算法 | 原始数据集 | 10万合成数据集 | |||||
准确率 | F1值 | 时间消耗/s | 准确率 | F1值 | 时间消耗/s | ||
NearestCentroid | 0.63 | 0.71 | 0.23 | 0.72 | 0.77 | 0.53 | |
DecisionTreeClassifier | 0.82 | 0.83 | 0.89 | 0.84 | 0.84 | 18.49 | |
ExtraTreeClassifier | 0.84 | 0.84 | 0.28 | 0.76 | 0.80 | 21.98 | |
LabelPropagation | 0.85 | 0.84 | 1 941.31 | 0.91 | 0.87 | 2 578.94 | |
LabelSpreading | 0.85 | 0.84 | 2 361.60 | 0.91 | 0.87 | 2 709.98 | |
PassiveAggressiveClassifier | 0.87 | 0.86 | 0.29 | 0.91 | 0.87 | 0.74 | |
BaggingClassifier | 0.91 | 0.87 | 4.69 | 0.79 | 0.81 | 121.11 | |
XGBClassifier | 0.91 | 0.87 | 7.02 | 0.77 | 0.80 | 41.74 | |
LinearDiscriminantAnalysis | 0.91 | 0.87 | 0.77 | 0.90 | 0.87 | 2.12 | |
KNeighborsClassifier | 0.91 | 0.87 | 78.30 | 0.90 | 0.87 | 675.94 | |
QuadraticDiscriminantAnalysis | 0.11 | 0.05 | 0.32 | 0.91 | 0.87 | 0.81 | |
CalibratedClassifierCV | 0.91 | 0.87 | 65.22 | 0.91 | 0.87 | 107.71 | |
LogisticRegression | 0.91 | 0.87 | 0.62 | 0.91 | 0.87 | 1.24 | |
LinearSVC | 0.91 | 0.87 | 17.02 | 0.91 | 0.87 | 25.61 | |
RidgeClassifier | 0.91 | 0.87 | 0.29 | 0.90 | 0.87 | 0.66 | |
RidgeClassifierCV | 0.91 | 0.87 | 0.40 | 0.90 | 0.87 | 1.10 | |
DummyClassifier | 0.84 | 0.84 | 0.21 | 0.64 | 0.72 | 0.58 | |
GaussianNB | 0.09 | 0.02 | 0.28 | 0.83 | 0.83 | 0.75 | |
BernoulliNB | 0.91 | 0.87 | 0.27 | 0.80 | 0.82 | 0.78 | |
LGBMClassifier | 0.91 | 0.87 | 0.77 | 0.68 | 0.75 | 3.32 | |
SGDClassifier | 0.91 | 0.87 | 0.60 | 0.91 | 0.87 | 1.46 | |
ExtraTreesClassifier | 0.91 | 0.87 | 8.54 | 0.75 | 0.79 | 0.79 | |
AdaBoostClassifier | 0.91 | 0.87 | 3.59 | 0.87 | 0.86 | 68.82 | |
SVC | 0.91 | 0.87 | 864.79 | 0.91 | 0.87 | 578.69 | |
CheckingClassifier | 0.91 | 0.87 | 0.17 | 0.91 | 0.87 | 0.48 | |
RandomForestClassifier | 0.91 | 0.87 | 7.47 | 0.28 | 0.35 | 81.50 | |
Perceptron | 0.81 | 0.82 | 0.34 | 0.91 | 0.87 | 0.62 | |
可用性统计平均值 | 0.821 | 0.794 | 198.751 | 0.827 | 0.822 | 260.981 |
[1] | ROCHER L , HENDRICKX J M , DE MONTJOYE Y A . Estimating the success of re-identifications in incomplete datasets using generative models[J]. Nature Communications, 2019,10(1): 1-9. |
[2] | GOODFELLOW I , POUGET-ABADIE J , MIRZA M ,et al. Generative adversarial nets[J]. Advances in Neural Information Processing Systems, 2014,27: 2672-2680. |
[3] | FAN J , LIU T Y , LI G L ,et al. Relational data synthesis using generative adversarial networks:a design space exploration[J]. arXiv Preprint,arXiv:2008.12763, 2020. |
[4] | POTDAR K , TAHER S , CHINMAY D . A comparative study of categorical variable encoding techniques for neural network classifiers[J]. International Journal of Computer Applications, 2017,175(4): 7-9. |
[5] | RODRíGUEZ P , BAUTISTA M A , GONZàLEZ J , ,et al. Beyond one-hot encoding:lower dimensional target embedding[J]. Image and Vision Computing, 2018,75: 21-31. |
[6] | ZHANG X , DOU D J , WU J . Learning conceptual-contextual embeddings for medical text[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2020,34(5): 9579-9586. |
[7] | BENGIO Y , COURVILLE A , VINCENT P . Representation learning:a review and new perspectives[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013,35(8): 1798-1828. |
[8] | XU L , SKOULARIDOU M , CUESTA-INFANTE A , ,et al. Modeling tabular data using conditional GAN[J]. Advances in Neural Information Processing Systems, 2019,32: 7335-7345. |
[9] | AGRAWAL R , SRIKANT R . Privacy-preserving data mining[C]// Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data. New York:ACM Press, 2000: 439-450. |
[10] | 方滨兴, 贾焰, 李爱平 ,等. 大数据隐私保护技术综述[J]. 大数据, 2016,2(1): 1-18. |
FANG B X , JIA Y , LI A P ,et al. Privacy preservation in big data:a survey[J]. Big Data Research, 2016,2(1): 1-18. | |
[11] | 李凤华, 李晖, 贾焰 ,等. 隐私计算研究范畴及发展趋势[J]. 通信学报, 2016,37(4): 1-11. |
LI F H , LI H , JIA Y ,et al. Privacy computing:concept,connotation and its research trend[J]. Journal on Communications, 2016,37(4): 1-11. | |
[12] | GARFINKEL S L . De-identification of personal information[R]. National Institute of Standards and Technology, 2015. |
[13] | STRACK B , DESHAZO J P , GENNINGS C ,et al. Impact of HbA1c measurement on hospital readmission rates:analysis of 70,000 clinical database patient records[J]. BioMed Research International,2014, 2014:781670. |
[14] | OSIA S A , SHAHIN SHAMSABADI A , SAJADMANESH S ,et al. A hybrid deep learning architecture for privacy-preserving mobile analytics[J]. IEEE Internet of Things Journal, 2020,7(5): 4505-4518. |
[15] | XIAO T H , TSAI Y H , SOHN K ,et al. Adversarial learning of privacy-preserving and task-oriented representations[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2020,34(7): 12434-12441. |
[16] | LIU S C , DU J Z , SHRIVASTAVA A ,et al. Privacy adversarial network[J]. Proceedings of the ACM on Interactive,Mobile,Wearable and Ubiquitous Technologies, 2019,3(4): 1-18. |
[17] | LI A , DUAN Y X , YANG H R ,et al. TIPRDC:task-independent privacy-respecting data crowdsourcing framework for deep learning with anonymized intermediate representations[C]// Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery &Data Mining. New York:ACM Press, 2020: 824-832. |
[18] | GUO C , BERKHAHN F . Entity embeddings of categorical variables[J]. arXiv Preprint,arXiv:1604.06737, 2016. |
[19] | SLEE V N . The international classification of diseases:ninth revision (ICD-9)[J]. Annals of Internal Medicine, 1978,88(3): 424. |
[20] | CHOI E , BAHADORI M T , SEARLES E ,et al. Multi-layer representation learning for medical concepts[C]// Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York:ACM Press, 2016: 1495-1504. |
[21] | WANG X , ZHANG Y D , SHI C . Hyperbolic heterogeneous information network embedding[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2019,33: 5337-5344. |
[22] | NICKEL M , KIELA D . Poincare embeddings for learning hierarchical representations[J]. arXiv Preprint,arXiv:1705.08039, 2017. |
[23] | ARJOVSKY M , CHINTALA S , BOTTOU L . Wasserstein generative adversarial networks[C]// Proceedings of International Conference on Machine Learning.[S.l.:s.n.], 2017: 214-223. |
[24] | PATKI N . The synthetic data vault:generative modeling for relational databases[D]. Cambridge:Massachusetts Institute of Technology, 2016. |
[25] | YALE A , DASH S , DUTTA R ,et al. Privacy preserving synthetic health data[C]// Proceedings of 2019 European Symposium on Artificial Neural Networks,Computational Intelligence and Machine Learning,[S.l.:S.n. 2019: 2-10. |
[26] | WEIJS S V , NOOIJEN V R , NICK V D G . Kullback–Leibler divergence as a forecast skill score with classic reliability–resolution–uncertainty decomposition[J]. Monthly Weather Review, 2010,138(9): 3387-3399. |
[27] | WANG W , SUN Y , HALGAMUGE S . Improving MMD-GAN training with repulsive loss function[J]. arXiv Preprint,arXiv:1812.09916, 2018. |
[28] | 邹福泰, 谭越, 王林 ,等. 基于生成对抗网络的僵尸网络检测[J]. 通信学报, 2021,42(7): 95-106. |
ZOU F T , TAN Y , WANG L ,et al. Botnet detection based on generative adversarial network[J]. Journal on Communications, 2021,42(7): 95-106. |
[1] | Jiale ZHANG, Chengcheng ZHU, Xiaobing SUN, Bing CHEN. Membership inference attack and defense method in federated learning based on GAN [J]. Journal on Communications, 2023, 44(5): 193-205. |
[2] | Xin SUN, Guifu ZHANG, Hongyan XING, Wang Zenghui. Research on intrusion detection for maritime meteorological sensor network based on balancing generative adversarial network [J]. Journal on Communications, 2023, 44(4): 124-136. |
[3] | Lingtao TANG, Di WANG, Shengyun LIU. Data augmentation scheme for federated learning with non-IID data [J]. Journal on Communications, 2023, 44(1): 164-176. |
[4] | Yanhua LIU, Jiaqi LI, Zhengui OU, Xiaoling GAO, Ximeng LIU, Weizhi MENG, Baoxu LIU. Adversarial training driven malicious code detection enhancement method [J]. Journal on Communications, 2022, 43(9): 169-180. |
[5] | Yanwen WANG, Weimin LEI, Wei ZHANG, Huan MENG, Xinyi CHEN, Wenhui YE, Qingyang JING. Survey on video image reconstruction method based on generative model [J]. Journal on Communications, 2022, 43(9): 194-208. |
[6] | Xueyuan DUAN, Yu FU, Kun WANG. Multi-dimensional time series anomaly detection method based on VAE-WGAN [J]. Journal on Communications, 2022, 43(3): 1-13. |
[7] | Zhuo CHEN, Miao ZHU, Junwei DU. Multi-view graph neural network for fraud detection algorithm [J]. Journal on Communications, 2022, 43(11): 225-232. |
[8] | Yanhui LU, Han LIU, Hang LI, Guangxu ZHU. Time series generation model based on multi-discriminator generative adversarial network [J]. Journal on Communications, 2022, 43(10): 167-176. |
[9] | Wei LIU, Cheng CHEN, Rui JIANG, Tao LU. Four-path unsupervised learning-based image defogging network [J]. Journal on Communications, 2022, 43(10): 210-222. |
[10] | Zhili ZHOU, Meimin WANG, Gaobo YANG, Jianyu ZHU, Xingming SUN. Generative steganography method based on auto-generation of contours [J]. Journal on Communications, 2021, 42(9): 144-154. |
[11] | Chen CHEN, Yafeng RONG, Chaoqun JI, Deyun CHEN, Yongjun HE. Speaker verification method based on deep information divergence maximization [J]. Journal on Communications, 2021, 42(7): 231-237. |
[12] | Hongyan WANG, Xiao YANG, Yanchao JIANG, Zumin WANG. Image denoising algorithm based on multi-channel GAN [J]. Journal on Communications, 2021, 42(3): 229-237. |
[13] | Zunwen HE, Shuai HOU, Wancheng ZHANG, Yan ZHANG. Multi-feature fusion classification method for communication specific emitter identification [J]. Journal on Communications, 2021, 42(2): 103-112. |
[14] | Ao LI, Zhuo WANG, Xiaoyang YU, Deyun CHEN, Yingtao ZHANG, Guanglu SUN. Robust multiview subspace clustering method based on multi-kernel low-redundancy representation learning [J]. Journal on Communications, 2021, 42(11): 193-204. |
[15] | Bin ZHANG, Renjie LIAO. Malicious domain name detection method based on associated information extraction [J]. Journal on Communications, 2021, 42(10): 162-172. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||
|