[1]
DEVLIN J, CHANG M W, LEE K, et al. BERT: Pre-training of Deep Bidirectional
Transformers for Language Understanding[C]//Proceedings of the 2019 Conference
of the North American Chapter ofthe Association for Computational Linguistics:
Human Language Technologies, Minneapolis, June 2-7, 2019. ACL press, 2019:
4171-4186.
[2]
YANG Z L, DAI Z L, CARBONELL J G, et al. XLNet: Generalized Autoregressive
Pretraining for Language Understanding[C]//Advances in Neural Information
Processing Systems 32: Annual Conference on Neural Information Processing
Systems, Canada, December 8-14, 2019. New York: NeurIPS, 2019: 5754-5764.
[3]
LIU Z, LIN W, SHI Y, et al. A robustly optimized BERT pre-training approach
with post-training[C]//Chinese Computational Linguistics: 20th China National
Conference, CCL 2021, Hohhot, China, August 13–15, 2021, Proceedings. Cham:
Springer International Publishing, 2021: 471-484.
[4]
XIE K, LU S, WANG M, et al. Elbert: Fast albert with confidence-window based
early exit[C]//ICASSP 2021-2021 IEEE International Conference on Acoustics,
Speech and Signal Processing (ICASSP). IEEE, 2021: 7713-7717.
[5]
LAN Z Z, CHEN M D, GOODMAN S, et al. ALBERT: A Lite BERT for Self-supervised
Learning of Language Representations[C]//8th International Conference on
Learning Representations, Ethiopia, April 26-30, 2020. New York:
OpenReview.net, 2020: 564-571.
[6]
JIAO X Q, YIN Y C, SHANG L F, et al. TinyBERT: Distilling BERT for Natural
Language Understanding[C]// Findings of the Association for Computational Linguistics,
Online Event, November 16-20, 2020. New York: EMNLP 2020: 4163-4174.
[7]
SUN S Q, CHENG Y, GEN Z, et al.Patient Knowledge Distillation for BERT Model
Compression[C]//
Proceedings of the 2019 Conference on Empirical
Methods in Natural Language Processing and the 9th International Joint
Conference on Natural Language Processing, Hong Kong, November 3-7, 2019. New
York: EMNLP-IJCNLP, 2019: 4322-4331.
[8]
ILICHEV A, SOROKIN N, PIONTKOVSKAYA I, et al. Multiple
Teacher Distillation for Robust and Greener Models[C]//Proceedings of the
International Conference on Recent Advances in Natural Language Processing,
Held Online, 1-3September, 2021. New York: RANLP, 2021: 601-610.
[9]
WANG A, SINGH A, MICHAEL J, et al. GLUE: A multi-task benchmark and analysis
platform for natural language understanding[J]. arXiv preprint
arXiv:1804.07461, 2018.
[10]
VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[J]. Advances
in neural information processing systems, 2017, 30.
[11]
任欢,王旭光.
注意力机制综述[J].
计算机应用,2021,41(z1):1-6.
REN H, WANG X G. Overview of attention
mechanism [J] Computer Applications, 2021,41 (z1): 1-6.
[12]
李爱黎,张子帅,林荫,等.
基于社交网络大数据的民众情感监测研究[J].
大数据,2022,8(6):105-126.
LI A L, ZHANG Z S, LIN Y, et al. Research
on public emotion monitoring based on social network big data [J] Big Data,2022,8(6):105-126.
[13]
韩立帆,季紫荆,陈子睿,等.
数字人文视域下面向历史古籍的信息抽取方法研究[J].
大数据,2022,8(6):26-39.
HAN L F, JI Z J, CHEN Z R, etc Research on
information extraction from historical ancient books from the perspective of
digital humanities [J] Big data, 2022,8 (6): 26-39..
[14]
MICHEL P, LEVY O, NEUBIG G. Are sixteen heads really better than one?[J].
Advances in neural information processing systems, 2019, 32.
[15]
XU Y, WANG Y, ZHOU A, et a1. Deep neural network compression with single and
multiple level quantization //[C] Proc of the 32nd AAAI Conf on Artificial
Intelligence.2018.
[16]
ZAFRIR O, BOUDOUKH G, IZSAK P, et al. Q8bert: Quantized 8bit bert[C]//2019
Fifth Workshop on Energy Efficient Machine Learning and Cognitive
Computing-NeurIPS Edition (EMC2-NIPS). IEEE, 2019: 36-39.
[17]
HINTON G, VINYALS O, DEAN J. Distilling the knowledge in a neural network[J].
arXiv preprint arXiv:1503.02531, 2015, 2(7).
[18]
Al-OMARI H, ABDULLAH M A, SHAIKH S. Emodet2: Emotion detection in english
textual dialogue using bert and bilstm models[C]//2020 11th International
Conference on Information and Communication Systems (ICICS). IEEE, 2020:
226-232.
[19]
杨秋勇,彭泽武,苏华权,等.
基于Bi-LSTM-CRF的中文电力实体识别[J].
信息技术,2021(9):45-50.
YANG Q Y, PENG Z W, SU H Q, et al.
Chinese power entity recognition based on Bi-LSTM-CRF [J] Information
Technology, 2021 (9): 45-50.
[20]
叶榕,邵剑飞,张小为,等.
基于BERT-CNN的新闻文本分类的知识蒸馏方法研究[J].
电子技术应用,2023,49(1):8-13.
YE R, SHAO J F, ZHANG X W, et al.
Research on knowledge distillation method of news text classification based on
BERT-CNN [J] Application of Electronic Technology, 2023,49 (1): 8-13.
[21]
XU C, ZHOU W, GE T, et al. BERT-of-theseus: Compressing BERT by progressive
module replacing[C]//Proceedings of Empirical Methods in Natural Language
Processing (EMNLP). 2021: 7859-7869,
[22]
张睿东.
基于BERT和知识蒸馏的自然语言理解研究
[D].南京大学,2020.
ZHANG E D. Research on natural language
understanding based on BERT and knowledge distillation [D]. Nanjing University,
2020.
[23]
FUKUDA T, KURATA G. Generalized Knowledge Distillation from an Ensemble of
Specialized Teachers Leveraging Unsupervised Neural Clustering[C]//ICASSP
2021-2021 IEEE International Conference on Acoustics, Speech and Signal
Processing (ICASSP). IEEE, 2021: 6868-6872.
[24]
CHO J H, HARIHARAN B. On the efficacy of knowledge distillation[C]//Proceedings
of the IEEE/CVF international conference on computer vision. 2019: 4794-4802.
[25]
JIANG L, WEN Z, LIANG Z, el al. Long short-term sample
distillation//[C]Proceedings of the AAAI Conference on Artificial Intelligence.
New York, USA, 2020: 4345-4352.
[26]
YANG Z, SHOU L, GONG M, et al. Model compression with two-stage multi-teacher
knowledge distillation for web question answering system[C]//Proceedings of the
13th International Conference on Web Search and Data Mining. 2020: 690-698.
[27]
WU C, WU F Z, HUANG Y F. One Teacher is Enough? Pre-trained Language Model
Distillation from Multiple Teachers[C]// Findings of the Association for
Computational Linguistics New York: ACL Press, 2021: 4408-4413.
[28]
YUAN F, SHOU L, PEI J, et al. Reinforced multi-teacher selection for knowledge
distillation[C]//Proceedings of the AAAI Conference on Artificial Intelligence.
2021, 35(16): 14284-14291.
[29]
CLARK K, LUONG M T, LE Q V, et al.ELECTRA: Pre-training Text Encoders as
Discriminators Rather Than Generators[C]// 8th
International Conference on Learning Representations, Addis Ababa, April 26-30,
2020. New York: ICLR, 2020.
|