Chinese Journal of Network and Information Security ›› 2023, Vol. 9 ›› Issue (1): 140-149.doi: 10.11959/j.issn.2096-109x.2023009

• Papers • Previous Articles     Next Articles

NLP neural network copyright protection based on black box watermark

Long DAI, Jing ZHANG, Xuefeng FAN, Xiaoyi ZHOU   

  1. School of Cyberspace Security, Hainan University, Haikou 570100, China
  • Revised:2022-12-04 Online:2023-02-25 Published:2023-02-01
  • Supported by:
    The Key R&D Project of Hainan(ZDYF2022GXTS224)

Abstract:

With the rapid development of natural language processing techniques, the use of language models in text classification and sentiment analysis has been increasing.However, language models are susceptible to piracy and redistribution by adversaries, posing a serious threat to the intellectual property of model owners.Therefore, researchers have been working on designing protection mechanisms to identify the copyright information of language models.However, existing watermarking of language models for text classification tasks cannot be associated with the owner’s identity, and they are not robust enough and cannot regenerate trigger sets.To solve these problems, a new model, namely black-box watermarking scheme for text classification tasks, was proposed.It was a scheme that can remotely and quickly verify model ownership.The copyright message and the key of the model owner were obtained through the Hash-based Message Authentication Code (HMAC), and the message digest obtained by HMAC can prevent forgery and had high security.A certain amount of text data was randomly selected from each category of the original training set and the digest was combined with the text data to construct the trigger set, then the watermark was embedded on the language model during the training process.To evaluate the performance of the proposed scheme, watermarks were embedded on three common language models on the IMDB’s movie reviews and CNews text classification datasets.The experimental results show that the accuracy of the proposed watermarking verification scheme can reach 100% without affecting the original model.Even under common attacks such as model fine-tuning and pruning, the proposed watermarking scheme shows strong robustness and resistance to forgery attacks.Meanwhile, the embedding of the watermark does not affect the convergence time of the model and has high embedding efficiency.

Key words: natural language processing, text classification, copyright protection, language model, black box watermarking

CLC Number: 

No Suggested Reading articles found!