DEVLIN J, CHANG M W, LEE K, et al. BERT: Pre-training of Deep Bidirectional
Transformers for Language Understanding[C]//Proceedings of the 2019 Conference
of the North American Chapter ofthe Association for Computational Linguistics:
Human Language Technologies, Minneapolis, June 2-7, 2019. ACL press, 2019:
YANG Z L, DAI Z L, CARBONELL J G, et al. XLNet: Generalized Autoregressive
Pretraining for Language Understanding[C]//Advances in Neural Information
Processing Systems 32: Annual Conference on Neural Information Processing
Systems, Canada, December 8-14, 2019. New York: NeurIPS, 2019: 5754-5764.
LIU Z, LIN W, SHI Y, et al. A robustly optimized BERT pre-training approach
with post-training[C]//Chinese Computational Linguistics: 20th China National
Conference, CCL 2021, Hohhot, China, August 13–15, 2021, Proceedings. Cham:
Springer International Publishing, 2021: 471-484.
XIE K, LU S, WANG M, et al. Elbert: Fast albert with confidence-window based
early exit[C]//ICASSP 2021-2021 IEEE International Conference on Acoustics,
Speech and Signal Processing (ICASSP). IEEE, 2021: 7713-7717.
LAN Z Z, CHEN M D, GOODMAN S, et al. ALBERT: A Lite BERT for Self-supervised
Learning of Language Representations[C]//8th International Conference on
Learning Representations, Ethiopia, April 26-30, 2020. New York:
OpenReview.net, 2020: 564-571.
JIAO X Q, YIN Y C, SHANG L F, et al. TinyBERT: Distilling BERT for Natural
Language Understanding[C]// Findings of the Association for Computational Linguistics,
Online Event, November 16-20, 2020. New York: EMNLP 2020: 4163-4174.
SUN S Q, CHENG Y, GEN Z, et al.Patient Knowledge Distillation for BERT Model
Proceedings of the 2019 Conference on Empirical
Methods in Natural Language Processing and the 9th International Joint
Conference on Natural Language Processing, Hong Kong, November 3-7, 2019. New
York: EMNLP-IJCNLP, 2019: 4322-4331.
Teacher Distillation for Robust and Greener Models[C]//Proceedings of the
International Conference on Recent Advances in Natural Language Processing,
Held Online, 1-3September, 2021. New York: RANLP, 2021: 601-610.
WANG A, SINGH A, MICHAEL J, et al. GLUE: A multi-task benchmark and analysis
platform for natural language understanding[J]. arXiv preprint
arXiv:1804.07461, 2018.
VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[J]. Advances
in neural information processing systems, 2017, 30.
REN H, WANG X G. Overview of attention
mechanism [J] Computer Applications, 2021,41 (z1): 1-6.
LI A L, ZHANG Z S, LIN Y, et al. Research
on public emotion monitoring based on social network big data [J] Big Data,2022,8(6):105-126.
HAN L F, JI Z J, CHEN Z R, etc Research on
information extraction from historical ancient books from the perspective of
digital humanities [J] Big data, 2022,8 (6): 26-39..
MICHEL P, LEVY O, NEUBIG G. Are sixteen heads really better than one?[J].
Advances in neural information processing systems, 2019, 32.
XU Y, WANG Y, ZHOU A, et a1. Deep neural network compression with single and
multiple level quantization //[C] Proc of the 32nd AAAI Conf on Artificial
ZAFRIR O, BOUDOUKH G, IZSAK P, et al. Q8bert: Quantized 8bit bert[C]//2019
Fifth Workshop on Energy Efficient Machine Learning and Cognitive
Computing-NeurIPS Edition (EMC2-NIPS). IEEE, 2019: 36-39.
HINTON G, VINYALS O, DEAN J. Distilling the knowledge in a neural network[J].
arXiv preprint arXiv:1503.02531, 2015, 2(7).
Al-OMARI H, ABDULLAH M A, SHAIKH S. Emodet2: Emotion detection in english
textual dialogue using bert and bilstm models[C]//2020 11th International
Conference on Information and Communication Systems (ICICS). IEEE, 2020:
YANG Q Y, PENG Z W, SU H Q, et al.
Chinese power entity recognition based on Bi-LSTM-CRF [J] Information
Technology, 2021 (9): 45-50.
YE R, SHAO J F, ZHANG X W, et al.
Research on knowledge distillation method of news text classification based on
BERT-CNN [J] Application of Electronic Technology, 2023,49 (1): 8-13.
XU C, ZHOU W, GE T, et al. BERT-of-theseus: Compressing BERT by progressive
module replacing[C]//Proceedings of Empirical Methods in Natural Language
Processing (EMNLP). 2021: 7859-7869,
ZHANG E D. Research on natural language
understanding based on BERT and knowledge distillation [D]. Nanjing University,
FUKUDA T, KURATA G. Generalized Knowledge Distillation from an Ensemble of
Specialized Teachers Leveraging Unsupervised Neural Clustering[C]//ICASSP
2021-2021 IEEE International Conference on Acoustics, Speech and Signal
Processing (ICASSP). IEEE, 2021: 6868-6872.
CHO J H, HARIHARAN B. On the efficacy of knowledge distillation[C]//Proceedings
of the IEEE/CVF international conference on computer vision. 2019: 4794-4802.
JIANG L, WEN Z, LIANG Z, el al. Long short-term sample
distillation//[C]Proceedings of the AAAI Conference on Artificial Intelligence.
New York, USA, 2020: 4345-4352.
YANG Z, SHOU L, GONG M, et al. Model compression with two-stage multi-teacher
knowledge distillation for web question answering system[C]//Proceedings of the
13th International Conference on Web Search and Data Mining. 2020: 690-698.
WU C, WU F Z, HUANG Y F. One Teacher is Enough? Pre-trained Language Model
Distillation from Multiple Teachers[C]// Findings of the Association for
Computational Linguistics New York: ACL Press, 2021: 4408-4413.
YUAN F, SHOU L, PEI J, et al. Reinforced multi-teacher selection for knowledge
distillation[C]//Proceedings of the AAAI Conference on Artificial Intelligence.
2021, 35(16): 14284-14291.
CLARK K, LUONG M T, LE Q V, et al.ELECTRA: Pre-training Text Encoders as
Discriminators Rather Than Generators[C]// 8th
International Conference on Learning Representations, Addis Ababa, April 26-30,
2020. New York: ICLR, 2020.