基于词序扰动的神经机器翻译模型鲁棒性研究

doi:10.11959/j.issn.2096-109x.2023078

Abstract

Abstract:

Pre-trained language model is one of the most important models in the natural language processing field, as pre-train-finetune has become the paradigm in various NLP downstream tasks.Previous studies have proved integrating pre-trained language models (e.g., BERT) into neural machine translation (NMT) models can improve translation performance.However, it is still unclear whether these improvements stem from enhanced semantic or syntactic modeling capabilities, as well as how pre-trained knowledge impacts the robustness of the models.To address these questions, a systematic study was conducted to examine the syntactic ability of BERT-enhanced NMT models using probing tasks.The study revealed that the enhanced models showed proficiency in modeling word order, highlighting their syntactic modeling capabilities.In addition, an attacking method was proposed to evaluate the robustness of NMT models in handling word order.BERT-enhanced NMT models yielded better translation performance in most of the tasks, indicating that BERT can improve the robustness of NMT models.It was observed that BERT-enhanced NMT model generated poorer translations than vanilla NMT model after attacking in the English-German translation task, which meant that English BERT worsened model robustness in such a scenario.Further analyses revealed that English BERT failed to bridge the semantic gap between the original and perturbed sources, leading to more copying errors and errors in translating low-frequency words.These findings suggest that the benefits of pre-training may not always be consistent in downstream tasks, and careful consideration should be given to its usage.

Key words: neural machine translation, pre-training model, robustness, word order

CLC Number:

TP393

Yuran ZHAO, Tang XUE, Gongshen LIU. Research on the robustness of neural machine translation systems in word order perturbation[J]. Chinese Journal of Network and Information Security, 2023, 9(5): 138-149.

Figures/Tables 12

References 54

[20]	ZHANG T , KISHORE V , WU F ,et al. BERTScore:evaluating text generation with BERT[C]// 8th International Conference on Learning Representations. 2020: 1-43.
[21]	CLINCHANT S , JUNG K W , NIKOULINA V . On the use of BERT for neural machine translation[C]// Proceedings of the 3rd Workshop on Neural Generation and Translation. 2019: 108-117.
[22]	ROTHE S , NARAYAN S , SEVERYN A . Leveraging pre-trained checkpoints for sequence generation tasks[J]. Transactions of the Association for Computational Linguistics, 2020,8: 264-280.
[23]	GUO J , ZHANG Z , XU L ,et al. Incorporating bert into parallel sequence decoding with adapters[J]. Advances in Neural Information Processing Systems. 2020,33: 10843-10854.
[24]	GUO J , ZHANG Z , XU L ,et al. Adaptive adapters:an efficient way to incorporate BERT into neural machine translation[J]. IEEE/ACM Transactions on Audio,Speech,and Language Processing, 2021,29: 1740-1751.
[25]	XU H , VAN DURME B , MURRAY K . BERT,mBERT,or BiBERT? a study on Contextualized Embeddings for Neural Machine Translation[C]// Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 2021: 6663-6675.
[26]	CONNEAU A , LAMPLE G . Cross-lingual language model pretraining[C]// Proceedings of the 33rd International Conference on Neural Information Processing Systems. 2019: 7059-7069.
[27]	SONG K , TAN X , QIN T ,et al. MASS:masked sequence to sequence pre-training for language generation[C]// International Conference on Machine Learning. 2019: 5926-5936.
[28]	LIU Y , GU J , GOYAL N ,et al. Multilingual denoising pre-training for neural machine translation[J]. Transactions of the Association for Computational Linguistics, 2020,8: 726-742.
[29]	LIN Z , PAN X , WANG M ,et al. Pre-training multilingual neural machine translation by leveraging alignment information[C]// Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2020: 2649-2663.
[30]	PAN X , WANG M , WU L ,et al. Contrastive learning for many-to-many multilingual neural machine translation[C]// Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing. 2021: 244-258.
[31]	LI P , LI L , ZHANG M ,et al. Universal conditional masked language pre-training for neural machine translation[C]// Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics. 2022: 6379-6391.
[32]	BELINKOV Y , BISK Y . Synthetic and natural noise both break neural machine translation[C]// 6th International Conference on Learning Representations. 2018: 1-13.
[33]	CHENG Y , TU Z , MENG F ,et al. Towards robust neural machine translation[C]// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. 2018: 1756-1766.
[34]	VAIBHAV V , SINGH S , STEWART C ,et al. Improving robustness of machine translation with synthetic noise[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies. 2019: 1916-1920.
[35]	MICHEL P , LI X , NEUBIG G ,et al. On evaluation of adversarial perturbations for sequence-to-sequence models[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies. 2019: 3103-3114.
[36]	SATO M , SUZUKI J , KIYONO S . Effective adversarial regularization for neural machine translation[C]// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019: 204-210.
[37]	CHENG Y , JIANG L , MACHEREY W ,et al. AdvAug:robust adversarial augmentation for neural machine translation[C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020: 5961-5970.
[38]	SENNRICH R , HADDOW B , BIRCH A . Neural machine translation of rare words with subword units[C]// Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. 2016: 1715-1725.
[39]	MICHEL P , NEUBIG G . MTNT:a testbed for machine translation of noisy text[C]// Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2018: 543-553.
[40]	WU Z , WU L , MENG Q ,et al. UniDrop:a simple yet effective technique to improve transformer without extra cost[C]// Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies. 2021: 3865-3878.
[41]	CHENG Y , WANG W , JIANG L ,et al. Self-supervised and supervised joint training for resource-rich machine translation[C]// International Conference on Machine Learning. 2021: 1825-1835.
[42]	AGIRRE E , CER D , DIAB M ,et al. SemEval-2012 task 6:A pilot on semantic textual similarity[C]// Proceedings of the Main Conference and the Shared Task,and Volume 2:Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012). 2012: 385-393.
[1]	DEVLIN J , CHANG M W , LEE K ,et al. BERT:pre-training of deep bidirectional transformers for language understanding[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies. 2019: 4171-4186.
[2]	ZHU J , XIA Y , WU L ,et al. Incorporating BERT into neural machine translation[C]// 8th International Conference on Learning Representations. 2020: 1-16.
[3]	BAZIOTIS C , HADDOW B , BIRCH A . Language model prior for low-resource neural machine translation[C]// Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2020: 7622-7634.
[4]	AN T , SONG J , LIU W . Incorporating pre-trained model into neural machine translation[C]// 2021 4th International Conference on Artificial Intelligence and Big Data (ICAIBD). 2021: 212-216.
[5]	LIU X , WANG L , WONG D F ,et al. On the complementarity between pre-training and back-translation for neural machine translation[C]// Findings of the Association for Computational Linguistics:EMNLP 2021. 2021: 2900-2907.
[6]	PETERS M E , NEUMANN M , ZETTLEMOYER L ,et al. Dissecting contextual word embeddings:architecture and representation[C]// Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2018: 1499-1509.
[7]	JAWAHAR G , SAGOT B , SEDDAH D . What does BERT learn about the structure of language[C]// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019: 3651-3657.
[8]	TENNEY I , DAS D , PAVLICK E . BERT rediscovers the classical NLP pipeline[C]// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019: 4593-4601.
[9]	TENNEY I , XIA P , CHEN B ,et al. What do you learn from context? Probing for sentence structure in contextualized word representations[C]// 7th International Conference on Learning Representations. 2019: 1-17.
[10]	HEWITT J , MANNING C D . A structural probe for finding syntax in word representations[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies. 2019: 4129-4138.
[11]	GOLDBERG Y . Assessing BERT's syntactic abilities[J]. arXiv preprint arXiv:1901.05287, 2019.
[43]	AGIRRE E , CER D , DIAB M ,et al. * SEM 2013 shared task:semantic textual similarity[C]// Proceedings of the Main Conference and the shared task:Semantic Textual Similarity. 2013: 32-43.
[44]	AGIRRE E , BANEA C , CARDIE C ,et al. Multilingual semantic Textual Similarity[C]// Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014). 2014: 81-91.
[45]	AGIRRE E , BANEA C , CARDIE C ,et al. Semantic textual similarity,english,spanish and pilot on interpretability[C]// Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015). 2015: 252-263.
[46]	AGIRRE E , BANEA C , CER D ,et al. Semantic Textual Similarity,Monolingual and Cross-Lingual Evaluation[C]// Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016). 2016: 497-511.
[47]	QI P , ZHANG Y , ZHANG Y ,et al. Stanza:a python natural language processing toolkit for many human languages[C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics:System Demonstrations. 2020: 101-108.
[48]	CONNEAU A , KIELA D . SentEval:an evaluation toolkit for universal sentence representations[C]// Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). 2018: 1699-1704.
[49]	WOLF T , DEBUT L , SANH V ,et al. Transformers:state-of-the-art natural language processing[C]// Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing:System Demonstrations. 2020: 38-45.
[50]	OTT M , EDUNOV S , BAEVSKI A ,et al. Fairseq:a fast,extensible toolkit for sequence modeling[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations). 2019: 48-53.
[51]	KINGMA D P , BA J . Adam:a method for stochastic optimization[C]// 3rd International Conference on Learning Representations. 2015: 1-15.
[52]	POST M . A call for clarity in reporting BLEU scores[C]// Proceedings of the Third Conference on Machine Translation:Research Papers. 2018: 186-191.
[53]	OTT M , AULI M , GRANGIER D ,et al. Analyzing uncertainty in neural machine translation[C]// International Conference on Machine Learning. 2018: 3956-3965.
[12]	SUNDARARAMAN D , SUBRAMANIAN V , WANG G ,et al. Syntax-infused transformer and bert models for machine translation and natural language understanding[J]. arXiv preprint arXiv:1911.06156, 2019.
[13]	WENG R , YU H , HUANG S ,et al. Acquiring knowledge from pre-trained model to neural machine translation[C]// Proceedings of the AAAI Conference on Artificial Intelligence. 2020: 9266-9273.
[14]	YANG J , WANG M , ZHOU H ,et al. Towards making the most of bert in neural machine translation[C]// Proceedings of the AAAI Conference on Artificial Intelligence. 2020: 9378-9385.
[15]	SHAVARANI H S , SARKAR A . Better neural machine translation by extracting linguistic information from BERT[C]// Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics. 2021: 2772-2783.
[16]	HAUSER J , MENG Z , PASCUAL D ,et al. BERT is robust! a case against synonym-based adversarial examples in text classification[J]. arXiv preprint arXiv:2109.07403, 2021.
[17]	VASWANI A , SHAZEER N , PARMAR N ,et al. Attention is all you need[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017: 6000-6010.
[18]	CONNEAU A , KRUSZEWSKI G , LAMPLE G ,et al. What you can cram into a single $ ＆!#* vector:probing sentence embeddings for linguistic properties[C]// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1:Long Papers). 2018: 2126-2136.
[19]	PAPINENI K , ROUKOS S , WARD T ,et al. Bleu:a method for automatic evaluation of machine translation[C]// Proceedings of the 40th annual Meeting of the Association for Computational Linguistics. 2002: 311-318.
[54]	NEUBIG G , DOU Z Y , HU J ,et al. compare-mt:a tool for holistic comparison of language generation systems[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations). 2019: 35-41.

Metrics

Recommended 0

No Suggested Reading articles found!

任务	示例	标签
Distance	why not just bring up the idea of staying in your current lab with your advisor ?	3
	word: idea, current
Depth	Budget negotiations between the mayor and the city council are entering high gear .	3
	word: mayo
BShift	She wondered how time much had passed .	Inverted
TreeDepth	Who knew who would be there ?	10
TopConst	I wanted to start asking questions now , but forced myself to wait .	NP_VP_

模型			准确率
模型	Distance	Depth	BShift	TreeDepth	TopConst
BERT	74.16%	78.79%	88.77%	36.21%	72.62%
NMT编码器	80.86%	83.53%	70.66%	39.94%	75.26%
BERT-NMT编码器	81.09%	83.97%	85.14%	40.66%	70.27%
掩码自注意力模块	77.91%	79.96%	85.97%	38.75%	71.74%
掩码BERT-Enc注意力模块	80.22%	82.07%	66.63%	39.80%	74.04%

p	句子
0.0	Two sets of lights so close to one another: inten-tional or just a silly error?
0.1	Two sets of lights so close to one another: inten-tional or just silly a error?
0.2	Two sets of so lights close to one another: or inten-tional a just silly error?
0.3	Two of sets lights so to close another one intentional or: just a silly error?
0.4	Two sets of so lights to one close: another or inten-tional just a silly error?
0.5	Two sets lights of close to so one another intention-al: or just silly a error?

模型	De-En BLEUScore/BERTScore	En-De BLEUScore/BERTScore	Fi-En BLEUScore/BERTScore	Tr-En BLEUScore/BERTScore	Zh-En BLEUScore/BERTScore
NMT	31.1/85.8	27.1/85.5	25.9/83.8	16.0/75.5	22.8/82.4
BERT-NMT	32.4/86.8	29.0/86.1	26.9/84.9	18.8/78.6	23.2/82.7

条件	句子
源语言句子	In Cameroon, there is only one doctor for every 5 000 people, according to the World Health Organization.
参考翻译	In Kamerun gibt es nur einen Arzt für je 5 000 Menschen, so die Weltgesundheitsorganisation.
NMT	Nach Angaben der Weltgesundheitsorganisation gibt es in Kamerun nur einen Arzt für jeden 5 000 Menschen.
BERT-NMT	In Kamerun gibt es laut Weltgesundheitsorganisation nur einen Arzt pro 5 000 Menschen.
源语言句子（p=0.5）	In, there is only one Cameroon for every 5 000 people, according to doctor the Health Organization World.
NMT（p=0.5）	In, gibt es nur ein Kamerun für jeden 5 000 Menschen, so der Arzt der Weltgesundheitsorganisation World.
BERT-NMT （p=0.5）	In, there is only one Cameroon for each 5 000 people, according to doctor the Health Organization World.

Research on the robustness of neural machine translation systems in word order perturbation

RichHTML

PDF下载

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 12

References 54

Related Articles 15

Metrics

Recommended 0

模型（攻击概率）		F1值
模型（攻击概率）	低频词	中频词	高频词
NMT（p=0）	24.1	28.4	46.8
BERT-NMT（p=0）	27.1	32.3	50.7
NMT（p=0.5）	19.0	22.9	41.7
BERT-NMT（p=0.5）	19.4	24.9	44.5

[1]	Xinxin XING, Qingya ZUO, Jianwei LIU. 5G-based smart airport network security scheme design and security analysis [J]. Chinese Journal of Network and Information Security, 2023, 9(5): 116-126.
[2]	Wenting GE, Weihai LI, Nenghai YU. Block level cloud data deduplication scheme based on attribute encryption [J]. Chinese Journal of Network and Information Security, 2023, 9(5): 106-115.
[3]	Qingfeng WANG, Hao LIANG, Yawen WANG, Genlin XIE, Benwei HE. Constructing method of opaque predicate based on type conversion and operation of floating point numbers [J]. Chinese Journal of Network and Information Security, 2023, 9(5): 48-58.
[4]	Shiyu HUANG, Feng YE, Tianqiang HUANG, Wei LI, Liqing HUANG, Haifeng LUO. Survey on adversarial attacks and defense of face forgery and detection [J]. Chinese Journal of Network and Information Security, 2023, 9(4): 1-15.
[5]	Weicheng QIU, Xiuzhen CHEN, Yinghua MA, Jin MA, Zhihong ZHOU. Predicting correlation relationships of entities between attack patterns and techniques based on word embedding and graph convolutional network [J]. Chinese Journal of Network and Information Security, 2023, 9(4): 40-52.
[6]	Yue YU, Xianzheng LIN, Weihai LI, Nenghai YU. Privacy-preserving data compression scheme for k-anonymity model based on Huffman coding [J]. Chinese Journal of Network and Information Security, 2023, 9(4): 64-73.
[7]	Wenqian XIAO, Gaobo YANG, Dewang WANG, Ming XIA. Reversible data hiding in encrypted images based on additive homomorphic encryption and multi-MSB embedding [J]. Chinese Journal of Network and Information Security, 2023, 9(4): 121-133.
[8]	Guopeng GAO, Yaodong FANG, Yanfang HAN, Zhenxing QIAN, Chuan QIN. Construction of multi-modal social media dataset for fake news detection [J]. Chinese Journal of Network and Information Security, 2023, 9(4): 144-154.
[9]	Bolin ZHANG, Chuntao ZHU, Qilin YIN, Jingqiao FU, Lingyi LIU, Jiarui LIU, Hongmei LIU, Wei LU. Noise-attention-based forgery face detection method [J]. Chinese Journal of Network and Information Security, 2023, 9(4): 155-165.
[10]	Ziyang ZHOU, Yun TAN, Jiaohua QIN, Xuyu XIANG. JPEG reversible data hiding method based on optimal DCT frequency embedding [J]. Chinese Journal of Network and Information Security, 2023, 9(4): 166-177.
[11]	Qian ZHOU, Haiping HUANG, Le WANG, Yanchun ZHANG, Fu XIAO. Connotation and practice of the integration of academic field based on Bourdieu’s theory——taking the cultivation of cyberspace security talents as an example [J]. Chinese Journal of Network and Information Security, 2023, 9(4): 178-187.
[12]	Xianyi CHEN, Jun GU, Kai YAN, Dong JIANG, Linfeng XU, Zhangjie FU. Double adversarial attack against license plate recognition system [J]. Chinese Journal of Network and Information Security, 2023, 9(3): 16-27.
[13]	Tianpeng YE, Xiang LIN, Jianhua LI, Xuankai ZHANG, Liwen XU. Personalized lightweight distributed network intrusion detection system in fog computing [J]. Chinese Journal of Network and Information Security, 2023, 9(3): 28-37.
[14]	Lijun ZU, Yalin CAO, Xiaohua MEN, Zhihui LYU, Jiawei YE, Hongyi LI, Liang ZHANG. Adaptive selection method of desensitization algorithm based on privacy risk assessment [J]. Chinese Journal of Network and Information Security, 2023, 9(3): 49-59.
[15]	Ruiqi XIA, Manman LI, Shaozhen CHEN. Identification on the structures of block ciphers using machine learning [J]. Chinese Journal of Network and Information Security, 2023, 9(3): 79-89.

模型（攻击概率）		F1值
模型（攻击概率）	低频词	中频词	高频词
NMT（p=0）	31.6	40.5	53.0
BERT-NMT（p=0）	33.5	43.2	54.9
NMT（p=0.5）	20.1	26.1	40.7
BERT-NMT（p=0.5）	19.8	26.4	41.1