Big Data Research ›› 2023, Vol. 9 ›› Issue (4): 159-171.doi: 10.11959/j.issn.2096-0271.2023008
• STUDY • Previous Articles
Cong LIU1, Xuefeng LYU1, Honglin WANG1, Xiaowei WANG2, Jin LU2, Shun SUN1, Songqi HU1
Online:
2023-07-01
Published:
2023-07-01
Supported by:
CLC Number:
Cong LIU, Xuefeng LYU, Honglin WANG, Xiaowei WANG, Jin LU, Shun SUN, Songqi HU. Medical named entity recognition algorithm based on probability distribution difference[J]. Big Data Research, 2023, 9(4): 159-171.
"
算法 1:基于概率分布差异的主动学习算法流程 |
输入:未标注的样本集U |
Step1: 从U中随机抽取部分样本L,通过标注平台A,进行样本标注; |
Step2: 构建实体识别BERT-BiLSTM-CRF模型M,使用现有标注样本训练模型M; |
Step3: 通过采样策略P从未标注样本集合U中筛选出差异值较大的数据集合; |
Step4: 通过标注平台A进行标注,得到标注样本集; |
Step5: 更新标注样本集; |
Step6: 基于更新的样本集L,更新采样策略函数P; |
Step7: 基于更新的样本集L,更新训练模型M; |
Step8: 将更新后的模型M在测试集中验证; |
if 达到收敛的条件:停止迭代; |
else:重复step3-step8; |
输出:新增后的样本集L,最终训练的模型M |
[1] | 杨威, 刘艳如, 孟颖 ,等. 浅谈临床医学术语的标准化管理[J]. 中国卫生标准管理, 2021,12(12): 1-4. |
YANG W , LIU Y R , MENG Y ,et al. Discussion on standardization management of clinical medical terminology[J]. China Health Standard Management, 2021,12(12): 1-4. | |
[2] | 赵嘉莹, 高鹏, 朱勇俊 ,等. 人工智能的应用将改进中国基层医疗卫生服务效能[J]. 中国全科医学, 2017,20(34): 4219-4223. |
ZHAO J Y , GAO P , ZHU Y J ,et al. The application of artificial intelligence could improve primary health care provision in China[J]. Chinese General Practice, 2017,20(24): 4219-4223. | |
[3] | 曾晓天, 徐春园, 张勇 ,等. 人工智能在医学大数据标准化体系建设中的研究进展[J]. 北京生物医学工程, 2019,38(6): 639-643. |
ZENG X T , XU C Y , ZHANG Y ,et al. Research progress on artificial intelligence in the standardization system construction of medical big data[J]. Beijing Biomedical Engineering, 2019,38(6): 640-644. | |
[4] | 郑强, 刘齐军, 王正华 ,等. 生物医学命名实体识别的研究与进展[J]. 计算机应用研究, 2010,27(3): 811-815,832. |
ZHENG Q , LIU Q J , WANG Z H ,et al. Research and development on biomedical named entity recognition[J]. Application Research of Computers, 2010,27(3): 811-815,832. | |
[5] | SETTLES B . Active learning literature survey[J]. Machine Learning, 2010,15(2): 201-221. |
[6] | HANISCH D , FUNDEL K , MEVISSEN H T ,et al. ProMiner:rule-based protein and gene entity recognition[J]. BMC Bioinformatics, 2005,6(Suppl 1): S14. |
[7] | 刘一佳, 车万翔, 刘挺 ,等. 基于序列标注的中文分词,词性标注模型比较分析[C]// 第六届全国青年计算语言学会议论文集. [出版者不详:出版地不详], 2012: 26-34. |
LIU Y J , CHE W X , LIU T ,et al. A comparison study of sequence labeling methods for Chinese word segmentation,POS tagging models[C]// The 6th Youth Conference of Computational Linguistics.[S.l.:s.n.], 2012: 26-34. | |
[8] | 王浩畅, 赵铁军 . 基于SVM的生物医学命名实体的识别[J]. 哈尔滨工程大学学报, 2006,27(S1): 570-574. |
WANG H C , ZHAO T J . SVM-based biomedical Name entity recognition[J]. Journal of Harbin Engineering University, 2006,27(S1): 570-574. | |
[9] | MORWAL S , CHOPRA D . NERHMM:a tool for named entity recognition based on hidden Markov model[J]. International Journal on Natural Language Computing, 2013,2(2): 43-49. |
[10] | PATIL N , PATIL A , PAWAR B V . Named entity recognition using conditional random fields[J]. Procedia Computer Science, 2020,167: 1181-1188. |
[11] | LAMPLE G , BALLESTEROS M , SUBRAMANIAN S ,et al. Neural architectures for named entity recognition[C]// Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies. Stroudsburg:Association for Computational Linguistics, 2016. |
[12] | OUYANG E , LI Y X , JIN L ,et al. Exploring N-gram character presentation in bidirectional RNN-CRF for Chinese clinical named entity recognition[C]// Proceedings of China Conference on Knowledge Graph and Semantic Computing 2017.[S.l.:s.n.], 2017. |
[13] | DONG X S , CHOWDHURY S , QIAN L J ,et al. Transfer bi-directional LSTM RNN for named entity recognition in Chinese electronic medical records[C]// Proceedings of 2017 IEEE 19th International Conference on e-Health Networking,Applications and Services. Piscataway:IEEE Press, 2017: 1-4. |
[14] | ZHANG Z C , ZHANG Y , ZHOU T . Medical knowledge attention enhanced neural model for named entity recognition in Chinese EMR[C]// Proceedings of China National Conference on Chinese Computational Linguistics,International Symposium on Natural Language Processing Based on Naturally Annotated Big Data. Cham:Springer, 2018: 376-385. |
[15] | WANG Q , XIA Y H , ZHOU Y M ,et al. Incorporating dictionaries into deep neural networks for the Chinese clinical named entity recognition[J]. Journal of Biomedical Informatics, 2019,92:103133. |
[16] | QIU J H , WANG Q , ZHOU Y M ,et al. Fast and accurate recognition of Chinese clinical named entities with residual dilated convolutions[C]// Proceedings of 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). Piscataway:IEEE Press, 2019: 935-942. |
[17] | LI X Y , ZHANG H , ZHOU X H . Chinese clinical named entity recognition with variant neural structures based on BERT methods[J]. Journal of Biomedical Informatics, 2020,107:103422. |
[18] | 张岑芳 . 基于主动学习的命名实体识别算法[J]. 计算机与现代化, 2021(7): 18-22. |
ZHANG C F . Named entity recognition algorithm based on active learning[J]. Computer and Modernization, 2021(7): 18-22. | |
[19] | 卢宁杰 . 结合主动学习的中文医疗命名实体识别研究[D]. 上海:华东师范大学, 2020. |
LU N J . Research on Chinese medical named entity recognition combined with active learning[D]. Shanghai:East China Normal University, 2020. | |
[20] | SHANNON C E . A mathematical theory of communication[J]. Bell System Technical Journal, 1948,27(4): 623-656. |
[21] | LEWIS D D , CATLETT J . Heterogeneous uncertainty sampling for supervised learning[M]// Machine learning proceedings 1994. Amsterdam: Elsevier, 1994: 148-156. |
[22] | SCHEFFER T , DECOMAIN C , WROBEL S . Active hidden Markov models for information extraction[M]// Advances in intelligent data analysis. Heidelberg: Springer, 2001: 309-318. |
[23] | DEVLIN J , CHANG M , LEE K ,et al. BERT:pre-training of deep bidirectional transformers for language understanding[J]. arXiv preprint. 2018:arXiv:1810.04805. |
[24] | GRAVES A , SCHMIDHUBER J . Framewise phoneme classification with bidirectional LSTM and other neural network architectures[J]. Neural Networks, 2005,18(5/6): 602-610. |
[25] | SUTTON C . An introduction to conditional random fields[J]. Foundations and Trends? in Machine Learning, 2012,4(4): 267-373. |
[26] | KINGMA D P , BA J . Adam:a method for stochastic optimization[J]. arXiv preprint,2014,arXiv:1412. 6980. |
[27] | ZAN H Y , LI W X , ZHANG K L ,et al. Building a pediatric medical corpus:word segmentation and named entity annotation[M]// Lecture notes in computer science. Cham: Springer, 2021: 652-664. |
[28] | LAN Z , CHEN M , GOODMAN S ,et al. ALBERT:a lite BERT for self-supervised learning of language representations[J]. arXiv preprint, 2019,arXiv:1909.11942. |
[29] | DIAO S Z , BAI J X , SONG Y ,et al. ZEN:pre-training Chinese text encoder enhanced by N-gram representations[C]// Proceedings of Findings of the Association for Computational Linguistics:EMNLP 2020. Stroudsburg:Association for Computational Linguistics, 2020. |
[1] | Yazhen YE, Yangyong ZHU. Digital transformation service platform:enhancing enterprise competitiveness in a new competitive situation [J]. Big Data Research, 2023, 9(3): 3-14. |
[2] | Dongqing LI, Yinxiao LIU, Lei DENG, Mingyang LI. Data valuation approach and application in view of data full lifecycle [J]. Big Data Research, 2023, 9(3): 39-55. |
[3] | Haihong QIAN, Maoyi WANG, Yun XIONG. Digital transformation in higher education:a systematic review [J]. Big Data Research, 2023, 9(3): 56-70. |
[4] | Doudou LIU, Baochen JIAO. Research on data asset cataloging of colleges and universities [J]. Big Data Research, 2023, 9(3): 71-84. |
[5] | Yadong WU, Jiaming CHEN, Yan LUO, Xuefeng WANG, Dechun HUANG, Chao NI, Jiming LAN, Suiqun LI, Weihan ZHANG, Wei DAI. An overview of Caideng metaverse research [J]. Big Data Research, 2023, 9(3): 97-113. |
[6] | Xu FENG, Hao CAO, Yang HU, Xiuqin WANG, Haoxiang ZHANG, Duanxin LING. Research on digital transformation evaluation of small and medium-sized enterprises and regional differences in China [J]. Big Data Research, 2023, 9(3): 168-180. |
[7] | Hongmin CHEN, Honglin XIONG, Li XU, Yunpeng YANG, Xunfang ZHUO. Analysis of data trading model and characteristics based on platform perspective [J]. Big Data Research, 2023, 9(2): 56-66. |
[8] | Yang SHEN, Menglong YU. Metaverse and big data: data insight and value connection in spatio-temporal intelligence [J]. Big Data Research, 2023, 9(1): 103-110. |
[9] | Yazhen YE, Yangyong ZHU. Data-Commerce-Ecosystem: data goods, data businessman and data commerce [J]. Big Data Research, 2023, 9(1): 111-125. |
[10] | Dejun WANG, Yanan DAI. Regulatory thinking and practice of financial business in the field of platform economy in digital economy era [J]. Big Data Research, 2022, 8(4): 46-55. |
[11] | Chenhuizi WANG, Wei CAI. Digital economics in metaverse: state-of-the-art, characteristics, and vision [J]. Big Data Research, 2022, 8(3): 140-150. |
[12] | Qifeng TANG, Zhiqing SHAO, Yazhen YE. Authenticating and licensing architecture of data rights in data trade [J]. Big Data Research, 2022, 8(3): 40-53. |
[13] | Nan LIU, Xuejing HAO, Yuhong CHEN. A review and comparative analysis of domestic and foreign research on big data pricing methods [J]. Big Data Research, 2021, 7(6): 89-102. |
[14] | Ming ZHAO, Dazhi DONG. Study on data asset management mechanism based on blockchain technology [J]. Big Data Research, 2021, 7(4): 49-60. |
[15] | Chuanru YIN, Tao JIN, Peng ZHANG, Jianmin WANG, Jiayi CHEN. Assessment and pricing of data assets:research review and prospect [J]. Big Data Research, 2021, 7(4): 14-27. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||
|