基于黑盒水印的NLP神经网络版权保护

doi:10.11959/j.issn.2096-109x.2023009

摘要/Abstract

摘要：

随着自然语言处理（NLP，natural language processing）技术的快速发展，语言模型在文本分类和情感分析中的应用不断增加。然而，语言模型容易遭到盗版再分发，对模型所有者的知识产权造成严重威胁。因此，研究者着手设计保护机制来识别语言模型的版权信息。现有的适用于文本分类任务的语言模型水印无法与所有者身份相关联，且鲁棒性不足以及无法再生成触发集。为了解决这些问题，提出一种新的适用于文本分类任务模型的黑盒水印方案，可以远程快速验证模型所有权。将模型所有者的版权消息和密钥通过密钥相关的哈希运算消息认证码（HMAC，hash-based message authentication code）得到版权消息摘要，由HMAC得到的消息摘要可以防止被伪造，具有很强的安全性。从原始训练集各个类别中随机挑选一定的文本数据，将摘要与文本数据结合构建触发集，并在训练过程中对语言模型嵌入水印。为了评估水印的性能，在IMDB电影评论、CNEWS中文新闻文本分类数据集上对3种常见的语言模型嵌入水印。实验结果表明，在不影响原始模型测试精度的情况下，所提出的水印验证方案的准确率可以达到 100%。即使在模型微调和剪枝等常见攻击下，也能表现出较强的鲁棒性，并且具有抗伪造攻击的能力。同时，水印的嵌入不会影响模型的收敛时间，具有较高的嵌入效率。

关键词: 自然语言处理, 文本分类, 版权保护, 语言模型, 黑盒水印

Abstract:

With the rapid development of natural language processing techniques, the use of language models in text classification and sentiment analysis has been increasing.However, language models are susceptible to piracy and redistribution by adversaries, posing a serious threat to the intellectual property of model owners.Therefore, researchers have been working on designing protection mechanisms to identify the copyright information of language models.However, existing watermarking of language models for text classification tasks cannot be associated with the owner’s identity, and they are not robust enough and cannot regenerate trigger sets.To solve these problems, a new model, namely black-box watermarking scheme for text classification tasks, was proposed.It was a scheme that can remotely and quickly verify model ownership.The copyright message and the key of the model owner were obtained through the Hash-based Message Authentication Code (HMAC), and the message digest obtained by HMAC can prevent forgery and had high security.A certain amount of text data was randomly selected from each category of the original training set and the digest was combined with the text data to construct the trigger set, then the watermark was embedded on the language model during the training process.To evaluate the performance of the proposed scheme, watermarks were embedded on three common language models on the IMDB’s movie reviews and CNews text classification datasets.The experimental results show that the accuracy of the proposed watermarking verification scheme can reach 100% without affecting the original model.Even under common attacks such as model fine-tuning and pruning, the proposed watermarking scheme shows strong robustness and resistance to forgery attacks.Meanwhile, the embedding of the watermark does not affect the convergence time of the model and has high embedding efficiency.

Key words: natural language processing, text classification, copyright protection, language model, black box watermarking

中图分类号:

TP391

代龙, 张静, 樊雪峰, 周晓谊. 基于黑盒水印的NLP神经网络版权保护[J]. 网络与信息安全学报, 2023, 9(1): 140-149.

Long DAI, Jing ZHANG, Xuefeng FAN, Xiaoyi ZHOU. NLP neural network copyright protection based on black box watermark[J]. Chinese Journal of Network and Information Security, 2023, 9(1): 140-149.

图/表 12

图1

图2

图3

图4

图5

表1

表2

表3

图6

图7

图8

图9

参考文献 20

[1]	MOHANARATHINAM A , KAMALRAJ S , PRASANNA-VENKATESAN G K D , ,et al. Digital watermarking techniques for image security:A review[J]. Journal of Ambient Intelligence and Humanized Computing, 2020,11(8): 3221-3229.
[2]	QASIM A F , MEZIANE F , ASPIN R . Digital watermarking:applicability for developing trust in medical imaging workflows state of the art review[J]. Computer Science Review, 2018,27: 45-60.
[3]	樊雪峰, 周晓谊, 朱冰冰 ,等. 深度神经网络模型版权保护方案综述[J]. 计算机研究与发展, 2022,59(5): 953-977.
	FAN X F , ZHOU X Y , ZHU B B ,et al. Survey of copyright protection schemes based on DNN model[J]. Journal of Computer Research and Development, 2022,59(5): 953-977.
[4]	UCHIDA Y , NAGAI Y , SAKAZAWA S ,et al. Embedding watermarks into deep neural networks[C]// Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval. 2017: 269-277.
[5]	ADI Y , BAUM C , CISSE M ,et al. Turning your weakness into a strength:Watermarking deep neural networks by backdooring[C]// 27th USENIX Security Symposium (USENIX Security 18). 2018: 1615-1631.
[6]	ZHANG J , GU Z , JANG J ,et al. Protecting intellectual property of deep neural networks with watermarking[C]// Proceedings of the 2018 on Asia Conference on Computer and Communications Security. 2018:159172.
[7]	LI M , ZHONG Q , ZHANG L Y ,et al. Protecting the intellectual property of deep neural networks with watermarking:The frequency domain approach[C]// 2020 IEEE 19th International Conference on Trust,Security and Privacy in Computing and Communications (TrustCom). 2020: 402-409.
[8]	ZHONG Q , ZHANG L Y , ZHANG J ,et al. Protecting IP of deep neural networks with watermarking:A new label helps[C]// PacificAsia Conference on Knowledge Discovery and Data Mining. 2020:462474.
[9]	ABDELNABI S , FRITZ M . Adversarial watermarking transformer:Towards tracing text provenance with data hiding[C]// 2021 IEEE Symposium on Security and Privacy (SP). 2021: 121-140.
[10]	YADOLLAHI M M , SHOELEH F , DADKHAH S ,et al. Robust black-box watermarking for deep neural network using inverse document frequency[EB].
[11]	CAMBRIA E , WHITE B . Jumping NLP curves:a review of natural language processing research[J]. IEEE Computational Intelligence Magazine, 2014,9(2): 48-57.
[12]	ALSHEMALI B , KALITA J . Improving the reliability of deep neural networks in NLP:A review[J]. Knowledge-Based Systems, 2020,191:105210.
[13]	DOS SANTOS C , ZADROZNY B . Learning character level representations for part of speech tagging[C]// International Conference on Machine Learning. 2014: 1818-1826.
[14]	XU D , TIAN Z , LAI R ,et al. Deep learning based emotion analysis of microblog texts[J]. Information Fusion, 2020,64: 1-11.
[15]	DHAR A , MUKHERJEE H , DASH N S ,et al. Text categorization:past and present[J]. Artificial Intelligence Review, 2021,54(4): 3007-3054.
[16]	DWIVEDI S K , SINGH V . Research and reviews in question answering system[J]. Procedia Technology, 2013,10: 417-424.
[17]	HUANG H , XU H , WANG X ,et al. Maximum F1-score discriminative training criterion for automatic mispronunciation detection[J]. IEEE/ACM Transactions on Audio,Speech,and Language Processing, 2015,23(4): 787-797.
[18]	YAO Y , LI H , ZHENG H ,et al. Latent backdoor attacks on deep neural networks[C]// Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security. 2019: 2041-2055.
[19]	SUN C , QIU X , XU Y ,et al. How to fine-tune bert for text classifycation[C]// China National Conference on Chinese Computational linguistics. 2019: 194-206.
[20]	OLCHANOV P , MALLYA A , TYREE S ,et al. Importance estimation for neural network pruning[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019: 11264-11272.

数据集	模型	训练损失	测试损失	训练准确度	测试准确度	精度	召回率	F1分数
	GRU	0.007 3	1.223 9	0.998 2	0.914 8	0.905 2	0.911 1	0.908 1
IMDB	LSTM	0.074 5	0.469 2	0.989 7	0.904 8	0.897 5	0.928 8	0.912 9
	TextCNN	0.025 0	0.377 0	0.991 9	0.912 2	0.923 5	0.899 9	0.911 5
	GRU	0.017 2	0.317 6	0.996 9	0.943 6	0.950 9	0.940 9	0.945 9
CNEWS	LSTM	0.009 4	0.295 9	0.998 5	0.943 0	0.933 1	0.951 1	0.942 0
	TextCNN	0.072 3	0.163 1	0.978 0	0.956 7	0.949 2	0.955 7	0.952 4

数据集	模型	训练损失	测试损失	训练准确度	测试准确度	精确度	召回率	F1分数
	GRU	0.038 4	1.147 0	0.991 2	0.923 0	0.903 1	0.916 0	0.909 5
IMDB	LSTM	0.046 0	0.661 3	0.991 7	0.906 6	0.912 5	0.909 4	0.910 9
	TextCNN	0.025 0	0.326 7	0.991 8	0.914 4	0.910 8	0.909 7	0.910 2
	GRU	0.029 9	0.373 4	0.993 9	0.933 7	0.943 0	0.933 4	0.938 2
CNEWS	LSTM	0.012 1	0.291 3	0.997 8	0.945 5	0.964 6	0.944 9	0.954 6
	TextCNN	0.067 7	0.162 5	0.980 3	0.954 1	0.947 9	0.958 7	0.953 3