支持鹏程系列开源大模型应用生态演化的可持续学习能力探索

doi:10.11959/j.issn.2096-6652.202212

智能科学与技术学报 ›› 2022, Vol. 4 ›› Issue (1): 97-108.doi: 10.11959/j.issn.2096-6652.202212

支持鹏程系列开源大模型应用生态演化的可持续学习能力探索

余跃¹^,², 刘欣¹, 蒋芳清¹, 张晗¹, 王晖¹, 曾炜³

¹ 鹏城实验室网络智能部开源所，广东深圳 518055
² 国防科技大学，湖南长沙 410073
³ 北京大学，北京 100091

修回日期:2022-01-13 出版日期:2022-03-15 发布日期:2022-03-01
作者简介:余跃（1988– ），男，博士，国防科技大学副研究员，鹏城实验室网络智能部开源所技术总师，主要研究方向为智能化软件工程、群体智能、开源生态等
刘欣（1989– ），男，博士，鹏城实验室网络智能部开源所博士后，主要研究方向为自然语言处理领域大模型持续演进、语义匹配、词义学习、胶囊网络应用等
蒋芳清（1989– ），男，鹏城实验室网络智能部开源所工程师，主要研究方向为自然语言处理、预训练大模型、小样本学习、持续学习等
张晗（1993– ），男，鹏城实验室网络智能部开源所联培博士生，主要研究方向为自然语言处理中的可持续学习、小样本学习、多语言机器翻译、预训练语言模型等
王晖（1968– ），男，博士，鹏城实验室网络智能部开源所研究员，主要研究方向为自然语言处理、分布式机器学习、联邦学习等
曾炜（1973– ），男，博士，北京大学副研究员，主要研究方向为计算机体系结构、分布式系统、计算机视觉等
基金资助:
新形势下我国技术开源战略研究(GHZX2020ZCQ013)

Exploration of the continual learning ability that supports the application ecological evolution of the large-scale pretraining Peng Cheng series open source models

Yue YU¹^,², Xin LIU¹, Fangqing JIANG¹, Han ZHANG¹, Hui WANG¹, Wei ZENG³

¹ Open Source Institution, Network Intelligence Department, Peng Cheng Laboratory, Shenzhen 518055, China
² National University of Defense Technology, Changsha 410073, China
³ Peking University, Beijing 100091, China

Revised:2022-01-13 Online:2022-03-15 Published:2022-03-01
Supported by:
Research on Chinese Technological Open Source Strategy Under New Situation(GHZX2020ZCQ013)

摘要/Abstract

摘要：

大规模预训练模型利用大规模语料以及多样化的预训练任务在自然语言处理领域取得了巨大的成功。随着大模型的逐步发展，大模型的可持续学习能力探索成为新的研究热点。主要介绍鹏程系列大模型持续学习的技术体系、应用实践以及面临的挑战，包括通过任务扩展、数据增量和知识推理的鹏程系列可持续学习技术体系，开源大模型鹏程·盘古多任务可持续学习和鹏程·通言大模型的可持续学习能力实践，大模型可持续学习过程中面临的词表更新、语义映射和知识冲突等挑战。

关键词: 鹏程系列大模型, 可持续学习, 鹏程·盘古, 鹏程·通言, 开源大模型

Abstract:

Large-scale pre-training models have achieved great success in the field of natural language processing by using large-scale corpora and pre-training tasks.With the gradual development of large models, the continual learning ability of large models has become a new research focus.The continual learning technology of the Peng Cheng series large models, the exploration of practice and the still facing challenges were mainly introduced, including the Peng Cheng series continual learning technology through task expansion, data increment and knowledge reasoning, Peng Cheng PANGU multi-task continual learning and the practical exploration of the continual learning ability of the Peng Cheng TONGYAN open source large model, the vocabulary update, semantic mapping and knowledge conflicts that the large model faces in the process of continual learning.

Key words: Peng Cheng series large model, continual learning, Peng Cheng PANGU, Peng Cheng TONGYAN, open source large model

中图分类号:

TP391.1

余跃, 刘欣, 蒋芳清, 等. 支持鹏程系列开源大模型应用生态演化的可持续学习能力探索[J]. 智能科学与技术学报, 2022, 4(1): 97-108.

Yue YU, Xin LIU, Fangqing JIANG, et al. Exploration of the continual learning ability that supports the application ecological evolution of the large-scale pretraining Peng Cheng series open source models[J]. Chinese Journal of Intelligent Science and Technology, 2022, 4(1): 97-108.

图/表 11

图1

图2

表1

表2

表3

表4

图3

图4

图5

图6

表5

部分语言的BLEU值"

语言	Transformer（多模型）		M2M-100 （官方版）		M2M-1.2B （微调）		鹏程?通言（单模型）
语言	中文-xx	xx-中文	中文-xx	xx-中文	中文-xx	xx-中文	中文-xx	提升	xx-中文	提升
意大利语	17.10	27.87	15.00	31.12	26.10	32.64	$29 . 90$	12.80	$38 . 67$	10.8
捷克语	14.88	31.50	10.90	29.18	16.50	32.90	$18 . 20$	3.32	$35 . 43$	3.93
荷兰语	15.21	27.46	14.20	32.92	21.40	37.44	$24 . 20$	8.99	$42 . 13$	14.67
葡萄牙语	17.46	27.79	15.80	32.99	27.40	36.89	$28 . 40$	10.94	$40.84$	13.05
印度尼西亚语	13.06	18.77	15.60	31.14	25.10	35.29	$27 . 20$	14.14	$38 . 89$	20.12
希伯来语	16.21	16.89	8.20	12.23	16.10	18.51	$18 . 00$	1.79	$20.32$	3.43
波斯尼亚语	11.38	16.23	4.40	11.80	11.90	17.46	$13 . 50$	2.12	$19 . 39$	3.16
希腊语	8.20	13.98	6.50	11.49	14.80	16.81	$17 . 10$	8.90	$18 . 72$	4.74
克罗地亚语	14.32	16.86	5.20	12.77	14.60	17.74	$15 . 70$	1.38	$19 . 98$	3.12

表5

参考文献 42

[1]	唐杰, 刘洋, 刘知远 ,等. 认知与可持续学习:预训练模型的技术展望[J]. 中国计算机学会通讯, 2021,17(5): 1-8.
	TANG J , LIU Y , LIU Z Y ,et al. Cognition and sustainable learning:technical prospects of pre-training models[J]. Communications of the CCF, 2021,17(5): 1-8.
[2]	ZENG W , REN X Z , SU T ,et al. PanGu-α:large-scale autoregressive pretrained Chinese language models with auto-parallel computation[J]. arXiv preprint, 2021,arXiv:2104.12369.
[3]	DEVLIN J , CHANG M W , LEE K ,et al. BERT:pre-training of deep bidirectional transformers for language understanding[C]// Proceedings of 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies. Stroudsburg:Association for Computational Linguistics, 2019: 4171-4186.
[4]	LAMPLE G , CONNEAU A . Cross-lingual language model pretraining[J]. Advances in Neural Information Processing Systems, 2019,32: 7059-7069.
[5]	LIU Y H , OTT M , GOYAL N ,et al. RoBERTa:a robustly optimized BERT pretraining approach[J]. arXiv preprint, 2019,arXiv:1907.11692.
[6]	XUE L T , CONSTANT N , ROBERTS A ,et al. MT5:a massively multilingual pre-trained text-to-text transformer[C]// Proceedings of 2021 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies. Stroudsburg:Association for Computational Linguistics, 2021: 483-498.
[7]	BROWN T B , MANN B , RYDER N ,et al. Language models are few-shot learners[J]. arXiv preprint, 2020,arXiv:2005.14165.
[8]	FEDUS W , ZOPH B , SHAZEER N . Switch transformers:scaling to trillion parameter models with simple and efficient sparsity[J]. arXiv preprint, 2021,arXiv:2101.03961.
[9]	YANG A , LIN J Y , MEN R ,et al. Exploring sparse expert models and beyond[J]. arXiv preprint, 2021arXiv:2105.15082.
[10]	PETERS M , NEUMANN M , IYYER M ,et al. Deep contextualized word representations[C]// Proceedings of 2018 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies. Stroudsburg:Association for Computational Linguistics, 2018: 2227-2237.
[11]	SONG K , TAN X , QIN T ,et al. MASS:masked sequence to sequence pre-training for language generation[C]// Proceedings of the International Conference on Machine Learning.[S.l.:s.n.], 2019: 5926-5936.
[12]	LIU Y H , GU J T , GOYAL N ,et al. Multilingual denoising pre-training for neural machine translation[J]. Transactions of the Association for Computational Linguistics, 2020,8: 726-742.
[13]	CHEN Z Y , MA N Z , LIU B . Lifelong learning for sentiment classification[C]// Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2:Short Papers). Stroudsburg:Association for Computational Linguistics, 2015: 750-756.
[14]	MERMILLOD M , BUGAISKA A , BONIN P . The stability-plasticity dilemma:investigating the continuum from catastrophic forgetting to age-limited learning effects[J]. Frontiers in Psychology, 2013,4:504.
[15]	MAI Z D , LI R W , JEONG J ,et al. Online continual learning in image classification:an empirical survey[J]. Neurocomputing, 2022,469: 28-51.
[16]	DELANGE M , ALJUNDI R , MASANA M ,et al. A continual learning survey:defying forgetting in classification tasks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021:1.
[17]	REBUFFI S A , KOLESNIKOV A , SPERL G ,et al. iCaRL:incremental classifier and representation learning[C]// Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE Press, 2017: 5533-5542.
[18]	DE LANGE M , TUYTELAARS T . Continual prototype evolution:learning online from non-stationary data streams[C]// Proceedings of the IEEE/CVF International Conference on Computer Vision.[S.l.:s.n.], 2021: 8250-8259.
[19]	ROBINS A . Catastrophic forgetting,rehearsal and pseudorehearsal[J]. Connection Science, 1995,7(2): 123-146.
[20]	GOODFELLOW I , POUGET-ABADIE J , MIRZA M ,et al. Generative adversarial nets[J]. Advances in Neural Information Processing Systems, 2014,27: 2672-2680.
[21]	LOPEZ-PAZ D , RANZATO M A . Gradient episodic memory for continual learning[J]. Advances in Neural Information Processing Systems, 2017,30: 6467-6476.
[22]	CHAUDHRY A , RANZATO M , ROHRBACH M ,et al. Efficient lifelong learning with A-GEM[C]// Proceedings of the International Conference on Learning Representations.[S.l.:s.n.], 2019.
[23]	ALJUNDI R , LIN M , GOUJAUD B ,et al. Online continual learning with no task boundaries[J]. arXiv preprint, 2019,arXiv:1903.08671.
[24]	SILVER D L , MERCER R E . The task rehearsal method of life-long learning:overcoming impoverished data[M]// Advances in artificial intelligence. Heidelberg: Springer Berlin Heidelberg, 2002: 90-101.
[25]	RANNEN A , ALJUNDI R , BLASCHKO M B ,et al. Encoder based lifelong learning[C]// Proceedings of 2017 IEEE International Conference on Computer Vision. Piscataway:IEEE Press, 2017: 1329-1337.
[26]	KIRKPATRICK J , PASCANU R , RABINOWITZ N ,et al. Overcoming catastrophic forgetting in neural networks[J]. Proceedings of the National Academy of Sciences of the United States of America, 2017,114(13): 3521-3526.
[27]	NGUYEN C V , LI Y Z , BUI T D ,et al. Variational continual learning[J]. arXiv preprint. 2017,arXiv:1710.10628.
[28]	ALJUNDI R , CHAKRAVARTY P , TUYTELAARS T . Expert Gate:lifelong learning with a network of experts[C]// Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE Press, 2017: 7120-7129.
[29]	BIESIALSKA M , BIESIALSKA K , COSTA-JUSSà M R , . Continual lifelong learning in natural language processing:a survey[C]// Proceedings of the 28th International Conference on Computational Linguistics. PA:International Committee on Computational Linguistics, 2020: 6523-6541.
[30]	SUN Y , WANG S H , LI Y K ,et al. ERNIE 2.0:a continual pre-training framework for language understanding[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2020,34(5): 8968-8975.
[31]	WANG R Z , TANG D Y , DUAN N ,et al. K-Adapter:infusing knowledge into pre-trained models with adapters[C]// Proceedings of the Findings of the Association for Computational Linguistics. Stroudsburg:Association for Computational Linguistics, 2021.
[32]	MA J Q , ZHAO Z , YI X Y ,et al. Modeling task relationships in multi-task learning with multi-gate mixture-of-experts[C]// Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery ＆ Data Mining. New York:ACM Press, 2018: 1930-1939.
[33]	BENGIO Y , LOURADOUR J , COLLOBERT R ,et al. Curriculum learning[C]// Proceedings of the 26th Annual International Conference on Machine Learning. New York:ACM Press, 2009: 41-48.
[34]	杜会芳, 王昊奋, 史英慧 ,等. 知识图谱多跳问答推理研究进展、挑战与展望[J]. 大数据, 2021,7(3): 60-79.
	DU H F , WANG H F , SHI Y H ,et al. Progress,challenges and research trends of reasoning in multi-hop knowledge graph based question answering[J]. Big Data Research, 2021,7(3): 60-79.
[35]	CUI Y , LIU T , CHEN Z ,et al. Dataset for the first evaluation on Chinese machine reading comprehension[C]// Proceedings of the 11th International Conference on Language Resources and Evaluation.[S.l.:s.n.], 2018: 2721-2725.
[36]	XU L , HU H , ZHANG X W ,et al. CLUE:a Chinese language understanding evaluation benchmark[C]// Proceedings of the 28th International Conference on Computational Linguistics. PA:International Committee on Computational Linguistics, 2020: 4762-4772.
[37]	XU L , DONG Q Q , YU C ,et al. CLUENER2020:fine-grained name entity recognition for Chinese[J]. arXiv preprint, 2020,arXiv:2001.04351.
[38]	亢晓勉, 宗成庆 . 融合篇章结构位置编码的神经机器翻译[J]. 智能科学与技术学报, 2020,2(2): 144-152.
	KANG X M , ZONG C Q . Fusion of discourse structural position encoding for neural machine translation[J]. Chinese Journal of Intelli-gent Science and Technology, 2020,2(2): 144-152.
[39]	CHEN T Q , GOODFELLOW I , SHLENS J . Net2Net:accelerating learning via knowledge transfer[J]. arXiv preprint, 2015,arXiv:1511.05641.
[40]	KAPLAN J , MCCANDLISH S , HENIGHAN T ,et al. Scaling laws for neural language models[J]. arXiv preprint, 2020,arXiv:2001.08361.
[41]	SHZAEER N , MIRHOSEINI A , MAZIARZ K ,et al. Outrageously large neural networks:the sparsely-gated mixture-of-experts layer[J]. arXiv preprint, 2017,arXiv:1701.06538.
[42]	ESCOLANO C , COSTA-JUSSà M R , FONOLLOSA J A R ,et al. Multilingual machine translation:closing the gap between shared and language-specific encoder-decoders[C]// Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics. Stroudsburg:Association for Computational Linguistics, 2021: 944-948.

任务	数据集	评测指标	评估数据	训练集	验证集	测试集
完形填空	CMRC2017	EM	测试集	300 000	2 000	3 000
阅读理解	CMRC2018	EM	验证集	10 142	1 002	3 219
NLI	CMNLI	ACC	验证集	391 782	12 426	13 880
文本匹配	AFQMC	ACC	验证集	34 334	4 316	3 861
实体识别	CLUENER	ACC	验证集	10 748	1 343	1 345
文本分类	TNEWS	ACC	验证集	53 360	10 000	10 000

任务	数据集	prompt
完形填空	CMRC2017	填空：\n{text}\n{question}{ans}
阅读理解	CMRC2018	阅读理解：\n{text}\n问：{question}\n答：{ans}
NLI	CMNLI	关系判断：\n{text1}\n{text2}\n选项：{options}\n答案：{ans}
文本匹配	AFQMC	语义相似度判断：\n{text1}\n{text2}\n选项：{options}\n答案：{ans}
实体识别	CLUENER	实体识别：\n{text}\n{entity_type}：{ans}
文本分类	TNEWS	文本分类：\n{text}\n选项：{options}\n答案：{ans}

基础模型	参数量	新增参数	权重衰减	dropout	优化器	学习率	回顾比例
鹏程·盘古（2.6 B）	26亿	0	0.1	0.1	Adam	10^-6~10^-5	1%～5%

任务	原始	混合	独立	持续
CMRC2017	37.83	63.2	62.033	58.566
CMRC2018	1.21	46.039	47.934	43.46
CMNLI	50.2	71.96	71.72	68.96
AFQMC	59.29	71.825	71.362	70.25
CLUENER	0	60.522	60.567	60.748
TNEWS	60.95	84.04	83.94	83.84
平均	34.913	66.264	66.259	64.304

支持鹏程系列开源大模型应用生态演化的可持续学习能力探索

Exploration of the continual learning ability that supports the application ecological evolution of the large-scale pretraining Peng Cheng series open source models

在线阅读

PDF下载

可视化

摘要/Abstract

引用本文

使用本文

图/表 11

参考文献 42

相关文章 4

Metrics

推荐阅读 0

[1]	陈妍, 罗雪琴, 梁伟, 谢永芳. 基于情感信息融合注意力机制的抑郁症识别[J]. 智能科学与技术学报, 2022, 4(4): 600-609.
[2]	陈鑫,吴佳宇,吴雪,张敏霞,郑宇军. 社区疫情排查的智能优化调度方法[J]. 智能科学与技术学报, 2020, 2(2): 126-134.
[3]	杜倩龙,宗成庆,苏克毅. 利用上下文相似度增强词对齐效果的自然语言推理方法[J]. 智能科学与技术学报, 2020, 2(1): 26-35.
[4]	孙星恺,王晓,陆浩. 面向活动的网络媒体监测与建模分析：IVFC案例解析[J]. 智能科学与技术学报, 2019, 1(4): 352-368.