[1] |
唐杰, 刘洋, 刘知远 ,等. 认知与可持续学习:预训练模型的技术展望[J]. 中国计算机学会通讯, 2021,17(5): 1-8.
|
|
TANG J , LIU Y , LIU Z Y ,et al. Cognition and sustainable learning:technical prospects of pre-training models[J]. Communications of the CCF, 2021,17(5): 1-8.
|
[2] |
ZENG W , REN X Z , SU T ,et al. PanGu-α:large-scale autoregressive pretrained Chinese language models with auto-parallel computation[J]. arXiv preprint, 2021,arXiv:2104.12369.
|
[3] |
DEVLIN J , CHANG M W , LEE K ,et al. BERT:pre-training of deep bidirectional transformers for language understanding[C]// Proceedings of 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies. Stroudsburg:Association for Computational Linguistics, 2019: 4171-4186.
|
[4] |
LAMPLE G , CONNEAU A . Cross-lingual language model pretraining[J]. Advances in Neural Information Processing Systems, 2019,32: 7059-7069.
|
[5] |
LIU Y H , OTT M , GOYAL N ,et al. RoBERTa:a robustly optimized BERT pretraining approach[J]. arXiv preprint, 2019,arXiv:1907.11692.
|
[6] |
XUE L T , CONSTANT N , ROBERTS A ,et al. MT5:a massively multilingual pre-trained text-to-text transformer[C]// Proceedings of 2021 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies. Stroudsburg:Association for Computational Linguistics, 2021: 483-498.
|
[7] |
BROWN T B , MANN B , RYDER N ,et al. Language models are few-shot learners[J]. arXiv preprint, 2020,arXiv:2005.14165.
|
[8] |
FEDUS W , ZOPH B , SHAZEER N . Switch transformers:scaling to trillion parameter models with simple and efficient sparsity[J]. arXiv preprint, 2021,arXiv:2101.03961.
|
[9] |
YANG A , LIN J Y , MEN R ,et al. Exploring sparse expert models and beyond[J]. arXiv preprint, 2021arXiv:2105.15082.
|
[10] |
PETERS M , NEUMANN M , IYYER M ,et al. Deep contextualized word representations[C]// Proceedings of 2018 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies. Stroudsburg:Association for Computational Linguistics, 2018: 2227-2237.
|
[11] |
SONG K , TAN X , QIN T ,et al. MASS:masked sequence to sequence pre-training for language generation[C]// Proceedings of the International Conference on Machine Learning.[S.l.:s.n.], 2019: 5926-5936.
|
[12] |
LIU Y H , GU J T , GOYAL N ,et al. Multilingual denoising pre-training for neural machine translation[J]. Transactions of the Association for Computational Linguistics, 2020,8: 726-742.
|
[13] |
CHEN Z Y , MA N Z , LIU B . Lifelong learning for sentiment classification[C]// Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2:Short Papers). Stroudsburg:Association for Computational Linguistics, 2015: 750-756.
|
[14] |
MERMILLOD M , BUGAISKA A , BONIN P . The stability-plasticity dilemma:investigating the continuum from catastrophic forgetting to age-limited learning effects[J]. Frontiers in Psychology, 2013,4:504.
|
[15] |
MAI Z D , LI R W , JEONG J ,et al. Online continual learning in image classification:an empirical survey[J]. Neurocomputing, 2022,469: 28-51.
|
[16] |
DELANGE M , ALJUNDI R , MASANA M ,et al. A continual learning survey:defying forgetting in classification tasks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021:1.
|
[17] |
REBUFFI S A , KOLESNIKOV A , SPERL G ,et al. iCaRL:incremental classifier and representation learning[C]// Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE Press, 2017: 5533-5542.
|
[18] |
DE LANGE M , TUYTELAARS T . Continual prototype evolution:learning online from non-stationary data streams[C]// Proceedings of the IEEE/CVF International Conference on Computer Vision.[S.l.:s.n.], 2021: 8250-8259.
|
[19] |
ROBINS A . Catastrophic forgetting,rehearsal and pseudorehearsal[J]. Connection Science, 1995,7(2): 123-146.
|
[20] |
GOODFELLOW I , POUGET-ABADIE J , MIRZA M ,et al. Generative adversarial nets[J]. Advances in Neural Information Processing Systems, 2014,27: 2672-2680.
|
[21] |
LOPEZ-PAZ D , RANZATO M A . Gradient episodic memory for continual learning[J]. Advances in Neural Information Processing Systems, 2017,30: 6467-6476.
|
[22] |
CHAUDHRY A , RANZATO M , ROHRBACH M ,et al. Efficient lifelong learning with A-GEM[C]// Proceedings of the International Conference on Learning Representations.[S.l.:s.n.], 2019.
|
[23] |
ALJUNDI R , LIN M , GOUJAUD B ,et al. Online continual learning with no task boundaries[J]. arXiv preprint, 2019,arXiv:1903.08671.
|
[24] |
SILVER D L , MERCER R E . The task rehearsal method of life-long learning:overcoming impoverished data[M]// Advances in artificial intelligence. Heidelberg: Springer Berlin Heidelberg, 2002: 90-101.
|
[25] |
RANNEN A , ALJUNDI R , BLASCHKO M B ,et al. Encoder based lifelong learning[C]// Proceedings of 2017 IEEE International Conference on Computer Vision. Piscataway:IEEE Press, 2017: 1329-1337.
|
[26] |
KIRKPATRICK J , PASCANU R , RABINOWITZ N ,et al. Overcoming catastrophic forgetting in neural networks[J]. Proceedings of the National Academy of Sciences of the United States of America, 2017,114(13): 3521-3526.
|
[27] |
NGUYEN C V , LI Y Z , BUI T D ,et al. Variational continual learning[J]. arXiv preprint. 2017,arXiv:1710.10628.
|
[28] |
ALJUNDI R , CHAKRAVARTY P , TUYTELAARS T . Expert Gate:lifelong learning with a network of experts[C]// Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE Press, 2017: 7120-7129.
|
[29] |
BIESIALSKA M , BIESIALSKA K , COSTA-JUSSà M R , . Continual lifelong learning in natural language processing:a survey[C]// Proceedings of the 28th International Conference on Computational Linguistics. PA:International Committee on Computational Linguistics, 2020: 6523-6541.
|
[30] |
SUN Y , WANG S H , LI Y K ,et al. ERNIE 2.0:a continual pre-training framework for language understanding[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2020,34(5): 8968-8975.
|
[31] |
WANG R Z , TANG D Y , DUAN N ,et al. K-Adapter:infusing knowledge into pre-trained models with adapters[C]// Proceedings of the Findings of the Association for Computational Linguistics. Stroudsburg:Association for Computational Linguistics, 2021.
|
[32] |
MA J Q , ZHAO Z , YI X Y ,et al. Modeling task relationships in multi-task learning with multi-gate mixture-of-experts[C]// Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. New York:ACM Press, 2018: 1930-1939.
|
[33] |
BENGIO Y , LOURADOUR J , COLLOBERT R ,et al. Curriculum learning[C]// Proceedings of the 26th Annual International Conference on Machine Learning. New York:ACM Press, 2009: 41-48.
|
[34] |
杜会芳, 王昊奋, 史英慧 ,等. 知识图谱多跳问答推理研究进展、挑战与展望[J]. 大数据, 2021,7(3): 60-79.
|
|
DU H F , WANG H F , SHI Y H ,et al. Progress,challenges and research trends of reasoning in multi-hop knowledge graph based question answering[J]. Big Data Research, 2021,7(3): 60-79.
|
[35] |
CUI Y , LIU T , CHEN Z ,et al. Dataset for the first evaluation on Chinese machine reading comprehension[C]// Proceedings of the 11th International Conference on Language Resources and Evaluation.[S.l.:s.n.], 2018: 2721-2725.
|
[36] |
XU L , HU H , ZHANG X W ,et al. CLUE:a Chinese language understanding evaluation benchmark[C]// Proceedings of the 28th International Conference on Computational Linguistics. PA:International Committee on Computational Linguistics, 2020: 4762-4772.
|
[37] |
XU L , DONG Q Q , YU C ,et al. CLUENER2020:fine-grained name entity recognition for Chinese[J]. arXiv preprint, 2020,arXiv:2001.04351.
|
[38] |
亢晓勉, 宗成庆 . 融合篇章结构位置编码的神经机器翻译[J]. 智能科学与技术学报, 2020,2(2): 144-152.
|
|
KANG X M , ZONG C Q . Fusion of discourse structural position encoding for neural machine translation[J]. Chinese Journal of Intelli-gent Science and Technology, 2020,2(2): 144-152.
|
[39] |
CHEN T Q , GOODFELLOW I , SHLENS J . Net2Net:accelerating learning via knowledge transfer[J]. arXiv preprint, 2015,arXiv:1511.05641.
|
[40] |
KAPLAN J , MCCANDLISH S , HENIGHAN T ,et al. Scaling laws for neural language models[J]. arXiv preprint, 2020,arXiv:2001.08361.
|
[41] |
SHZAEER N , MIRHOSEINI A , MAZIARZ K ,et al. Outrageously large neural networks:the sparsely-gated mixture-of-experts layer[J]. arXiv preprint, 2017,arXiv:1701.06538.
|
[42] |
ESCOLANO C , COSTA-JUSSà M R , FONOLLOSA J A R ,et al. Multilingual machine translation:closing the gap between shared and language-specific encoder-decoders[C]// Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics. Stroudsburg:Association for Computational Linguistics, 2021: 944-948.
|