网络与信息安全学报 ›› 2023, Vol. 9 ›› Issue (1): 140-149.doi: 10.11959/j.issn.2096-109x.2023009

• 学术论文 • 上一篇    下一篇

基于黑盒水印的NLP神经网络版权保护

代龙, 张静, 樊雪峰, 周晓谊   

  1. 海南大学网络空间安全学院,海南 海口 570100
  • 修回日期:2022-12-04 出版日期:2023-02-25 发布日期:2023-02-01
  • 作者简介:代龙(1996- ),男,安徽合肥人,海南大学硕士生,主要研究方向为深度学习、人工智能安全
    张静(1998- ),女,河南平顶山人,海南大学硕士生,主要研究方向为深度学习、人工智能安全
    樊雪峰(1998- ),男,河南汝州人,海南大学硕士生,主要研究方向为人工智能安全、神经网络水印
    周晓谊(1979- ),女,海南海口人,博士,海南大学副教授,主要研究方向为多媒体信息隐藏和加密、神经网络模型版权保护
  • 基金资助:
    海南省重点研发计划(ZDYF2022GXTS224)

NLP neural network copyright protection based on black box watermark

Long DAI, Jing ZHANG, Xuefeng FAN, Xiaoyi ZHOU   

  1. School of Cyberspace Security, Hainan University, Haikou 570100, China
  • Revised:2022-12-04 Online:2023-02-25 Published:2023-02-01
  • Supported by:
    The Key R&D Project of Hainan(ZDYF2022GXTS224)

摘要:

随着自然语言处理(NLP,natural language processing)技术的快速发展,语言模型在文本分类和情感分析中的应用不断增加。然而,语言模型容易遭到盗版再分发,对模型所有者的知识产权造成严重威胁。因此,研究者着手设计保护机制来识别语言模型的版权信息。现有的适用于文本分类任务的语言模型水印无法与所有者身份相关联,且鲁棒性不足以及无法再生成触发集。为了解决这些问题,提出一种新的适用于文本分类任务模型的黑盒水印方案,可以远程快速验证模型所有权。将模型所有者的版权消息和密钥通过密钥相关的哈希运算消息认证码(HMAC,hash-based message authentication code)得到版权消息摘要,由HMAC得到的消息摘要可以防止被伪造,具有很强的安全性。从原始训练集各个类别中随机挑选一定的文本数据,将摘要与文本数据结合构建触发集,并在训练过程中对语言模型嵌入水印。为了评估水印的性能,在IMDB电影评论、CNEWS中文新闻文本分类数据集上对3种常见的语言模型嵌入水印。实验结果表明,在不影响原始模型测试精度的情况下,所提出的水印验证方案的准确率可以达到 100%。即使在模型微调和剪枝等常见攻击下,也能表现出较强的鲁棒性,并且具有抗伪造攻击的能力。同时,水印的嵌入不会影响模型的收敛时间,具有较高的嵌入效率。

关键词: 自然语言处理, 文本分类, 版权保护, 语言模型, 黑盒水印

Abstract:

With the rapid development of natural language processing techniques, the use of language models in text classification and sentiment analysis has been increasing.However, language models are susceptible to piracy and redistribution by adversaries, posing a serious threat to the intellectual property of model owners.Therefore, researchers have been working on designing protection mechanisms to identify the copyright information of language models.However, existing watermarking of language models for text classification tasks cannot be associated with the owner’s identity, and they are not robust enough and cannot regenerate trigger sets.To solve these problems, a new model, namely black-box watermarking scheme for text classification tasks, was proposed.It was a scheme that can remotely and quickly verify model ownership.The copyright message and the key of the model owner were obtained through the Hash-based Message Authentication Code (HMAC), and the message digest obtained by HMAC can prevent forgery and had high security.A certain amount of text data was randomly selected from each category of the original training set and the digest was combined with the text data to construct the trigger set, then the watermark was embedded on the language model during the training process.To evaluate the performance of the proposed scheme, watermarks were embedded on three common language models on the IMDB’s movie reviews and CNews text classification datasets.The experimental results show that the accuracy of the proposed watermarking verification scheme can reach 100% without affecting the original model.Even under common attacks such as model fine-tuning and pruning, the proposed watermarking scheme shows strong robustness and resistance to forgery attacks.Meanwhile, the embedding of the watermark does not affect the convergence time of the model and has high embedding efficiency.

Key words: natural language processing, text classification, copyright protection, language model, black box watermarking

中图分类号: 

No Suggested Reading articles found!