基于黑盒水印的NLP神经网络版权保护

doi:10.11959/j.issn.2096-109x.2023009

Abstract

Abstract:

With the rapid development of natural language processing techniques, the use of language models in text classification and sentiment analysis has been increasing.However, language models are susceptible to piracy and redistribution by adversaries, posing a serious threat to the intellectual property of model owners.Therefore, researchers have been working on designing protection mechanisms to identify the copyright information of language models.However, existing watermarking of language models for text classification tasks cannot be associated with the owner’s identity, and they are not robust enough and cannot regenerate trigger sets.To solve these problems, a new model, namely black-box watermarking scheme for text classification tasks, was proposed.It was a scheme that can remotely and quickly verify model ownership.The copyright message and the key of the model owner were obtained through the Hash-based Message Authentication Code (HMAC), and the message digest obtained by HMAC can prevent forgery and had high security.A certain amount of text data was randomly selected from each category of the original training set and the digest was combined with the text data to construct the trigger set, then the watermark was embedded on the language model during the training process.To evaluate the performance of the proposed scheme, watermarks were embedded on three common language models on the IMDB’s movie reviews and CNews text classification datasets.The experimental results show that the accuracy of the proposed watermarking verification scheme can reach 100% without affecting the original model.Even under common attacks such as model fine-tuning and pruning, the proposed watermarking scheme shows strong robustness and resistance to forgery attacks.Meanwhile, the embedding of the watermark does not affect the convergence time of the model and has high embedding efficiency.

Key words: natural language processing, text classification, copyright protection, language model, black box watermarking

CLC Number:

TP391

Long DAI, Jing ZHANG, Xuefeng FAN, Xiaoyi ZHOU. NLP neural network copyright protection based on black box watermark[J]. Chinese Journal of Network and Information Security, 2023, 9(1): 140-149.

Figures/Tables 12

References 20

[1]	MOHANARATHINAM A , KAMALRAJ S , PRASANNA-VENKATESAN G K D , ,et al. Digital watermarking techniques for image security:A review[J]. Journal of Ambient Intelligence and Humanized Computing, 2020,11(8): 3221-3229.
[2]	QASIM A F , MEZIANE F , ASPIN R . Digital watermarking:applicability for developing trust in medical imaging workflows state of the art review[J]. Computer Science Review, 2018,27: 45-60.
[3]	樊雪峰, 周晓谊, 朱冰冰 ,等. 深度神经网络模型版权保护方案综述[J]. 计算机研究与发展, 2022,59(5): 953-977.
	FAN X F , ZHOU X Y , ZHU B B ,et al. Survey of copyright protection schemes based on DNN model[J]. Journal of Computer Research and Development, 2022,59(5): 953-977.
[4]	UCHIDA Y , NAGAI Y , SAKAZAWA S ,et al. Embedding watermarks into deep neural networks[C]// Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval. 2017: 269-277.
[5]	ADI Y , BAUM C , CISSE M ,et al. Turning your weakness into a strength:Watermarking deep neural networks by backdooring[C]// 27th USENIX Security Symposium (USENIX Security 18). 2018: 1615-1631.
[6]	ZHANG J , GU Z , JANG J ,et al. Protecting intellectual property of deep neural networks with watermarking[C]// Proceedings of the 2018 on Asia Conference on Computer and Communications Security. 2018:159172.
[7]	LI M , ZHONG Q , ZHANG L Y ,et al. Protecting the intellectual property of deep neural networks with watermarking:The frequency domain approach[C]// 2020 IEEE 19th International Conference on Trust,Security and Privacy in Computing and Communications (TrustCom). 2020: 402-409.
[8]	ZHONG Q , ZHANG L Y , ZHANG J ,et al. Protecting IP of deep neural networks with watermarking:A new label helps[C]// PacificAsia Conference on Knowledge Discovery and Data Mining. 2020:462474.
[9]	ABDELNABI S , FRITZ M . Adversarial watermarking transformer:Towards tracing text provenance with data hiding[C]// 2021 IEEE Symposium on Security and Privacy (SP). 2021: 121-140.
[10]	YADOLLAHI M M , SHOELEH F , DADKHAH S ,et al. Robust black-box watermarking for deep neural network using inverse document frequency[EB].
[11]	CAMBRIA E , WHITE B . Jumping NLP curves:a review of natural language processing research[J]. IEEE Computational Intelligence Magazine, 2014,9(2): 48-57.
[12]	ALSHEMALI B , KALITA J . Improving the reliability of deep neural networks in NLP:A review[J]. Knowledge-Based Systems, 2020,191:105210.
[13]	DOS SANTOS C , ZADROZNY B . Learning character level representations for part of speech tagging[C]// International Conference on Machine Learning. 2014: 1818-1826.
[14]	XU D , TIAN Z , LAI R ,et al. Deep learning based emotion analysis of microblog texts[J]. Information Fusion, 2020,64: 1-11.
[15]	DHAR A , MUKHERJEE H , DASH N S ,et al. Text categorization:past and present[J]. Artificial Intelligence Review, 2021,54(4): 3007-3054.
[16]	DWIVEDI S K , SINGH V . Research and reviews in question answering system[J]. Procedia Technology, 2013,10: 417-424.
[17]	HUANG H , XU H , WANG X ,et al. Maximum F1-score discriminative training criterion for automatic mispronunciation detection[J]. IEEE/ACM Transactions on Audio,Speech,and Language Processing, 2015,23(4): 787-797.
[18]	YAO Y , LI H , ZHENG H ,et al. Latent backdoor attacks on deep neural networks[C]// Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security. 2019: 2041-2055.
[19]	SUN C , QIU X , XU Y ,et al. How to fine-tune bert for text classifycation[C]// China National Conference on Chinese Computational linguistics. 2019: 194-206.
[20]	OLCHANOV P , MALLYA A , TYREE S ,et al. Importance estimation for neural network pruning[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019: 11264-11272.

Metrics

Recommended 0

No Suggested Reading articles found!

数据集	模型	训练损失	测试损失	训练准确度	测试准确度	精度	召回率	F1分数
	GRU	0.007 3	1.223 9	0.998 2	0.914 8	0.905 2	0.911 1	0.908 1
IMDB	LSTM	0.074 5	0.469 2	0.989 7	0.904 8	0.897 5	0.928 8	0.912 9
	TextCNN	0.025 0	0.377 0	0.991 9	0.912 2	0.923 5	0.899 9	0.911 5
	GRU	0.017 2	0.317 6	0.996 9	0.943 6	0.950 9	0.940 9	0.945 9
CNEWS	LSTM	0.009 4	0.295 9	0.998 5	0.943 0	0.933 1	0.951 1	0.942 0
	TextCNN	0.072 3	0.163 1	0.978 0	0.956 7	0.949 2	0.955 7	0.952 4

数据集	模型	训练损失	测试损失	训练准确度	测试准确度	精确度	召回率	F1分数
	GRU	0.038 4	1.147 0	0.991 2	0.923 0	0.903 1	0.916 0	0.909 5
IMDB	LSTM	0.046 0	0.661 3	0.991 7	0.906 6	0.912 5	0.909 4	0.910 9
	TextCNN	0.025 0	0.326 7	0.991 8	0.914 4	0.910 8	0.909 7	0.910 2
	GRU	0.029 9	0.373 4	0.993 9	0.933 7	0.943 0	0.933 4	0.938 2
CNEWS	LSTM	0.012 1	0.291 3	0.997 8	0.945 5	0.964 6	0.944 9	0.954 6
	TextCNN	0.067 7	0.162 5	0.980 3	0.954 1	0.947 9	0.958 7	0.953 3

NLP neural network copyright protection based on black box watermark

RichHTML

PDF下载

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 12

References 20

Related Articles 4

Metrics

Recommended 0

数据集	训练/测试	GRU	LSTM	TextCNN
IMDB	触发集（训练）	100%	100%	100%
	触发集（测试）	100%	100%	100%
CNEWS	触发集（训练）	100%	100%	100%
	触发集（测试）	100%	100%	100%

[1]	Yu ZHANG, Binglong LI, Xuejuan LI, Heyu ZHANG. Evidence classification method of chat text based on DSR and BGRU model [J]. Chinese Journal of Network and Information Security, 2022, 8(2): 150-159.
[2]	Xi FU,Hui LI,Xingwen ZHAO. Survey on phishing detection research [J]. Chinese Journal of Network and Information Security, 2020, 6(5): 1-10.
[3]	Sijia DU,Haining YU,Hongli ZHANG. Survey of text classification methods based on deep learning [J]. Chinese Journal of Network and Information Security, 2020, 6(4): 1-13.
[4]	You YU, Yu FU, Xiaoping WU. Summary of text classification methods [J]. Chinese Journal of Network and Information Security, 2019, 5(5): 1-8.