电信科学 ›› 2023, Vol. 39 ›› Issue (9): 32-42.doi: 10.11959/j.issn.1000-0801.2023179

• 专题:网络智能化与生成式人工智能 • 上一篇    

基于生成式因果语言模型的水印嵌入与检测

刘明录, 郑彦, 韩雪, 袁向阳, 邓超   

  1. 中国移动通信有限公司研究院,北京 100053
  • 修回日期:2023-09-04 出版日期:2023-08-01 发布日期:2023-08-01
  • 作者简介:刘明录(1987- ),男,中国移动研究院人工智能与智慧运营中心算法研究员,主要研究方向为自然语言处理、知识图谱等
    郑彦(1993- ),男,中国移动通信有限公司研究院人工智能与智慧运营中心算法研究员,主要研究方向为大型语言模型及模型的可解释性、公平性
    韩雪(1981- ),女,博士,现任中国移动通信有限公司研究院人工智能与智慧运营中心研究科学家,主要研究方向为NLP和多模态融合技术
    袁向阳(1978- ),男,中国移动通信有限公司研究院人工智能与智慧运营中心副总经理,主要研究方向为BSS、OSS等 IT支撑系统及AI技术在网络智能化中的应用
    邓超(1978- ),男,中国移动通信有限公司研究院人工智能与智慧运营中心常务副总经理,主要研究方向为人工智能、通信网络智能化、大数据和 IT 技术研发

Watermark embedding and detection based on generative causal language model

Minglu LIU, Yan ZHENG, Xue HAN, Xiangyang YUAN, Chao DENG   

  1. China Mobile Research Institute, Beijing 100053, China
  • Revised:2023-09-04 Online:2023-08-01 Published:2023-08-01

摘要:

基于人工智能内容生成(AIGC)技术生成文本具有道德、法律的合规性风险,需要对生成文本内容的流通进行规范和监管,因此对 AIGC 生成文本版权保护的迫切需求随之出现。水印技术是目前使用最广泛的数字版权保护方式。提出了一种应用于生成式因果语言模型的生成文本的水印添加技术,采用事中水印嵌入的方式在文本生成过程中隐式地嵌入文本水印特征编码,相较于传统事后水印添加技术对生成文本质量影响小,具有低感知、透明、鲁棒等优点。实验结果表明,提出的水印嵌入策略具有较好的鲁棒性,经过用户一定程度的编辑后仍旧能有效检出文本嵌入水印。与原有生成策略进行对比,所提方法与现有模型耦合度低,无须调整原有模型结构、训练策略、部署方式,不增加原有生成过程计算成本。

关键词: 人工智能内容生成, 因果语言模型, 数字水印, 数字版权

Abstract:

Artificial intelligence generated content (AIGC) generated text itself carried moral and legal compliance risks, and the circulation of generated text content need to be regulated.Therefore, there was an urgent need for copyright protection of AIGC generated text.Watermarking technology was currently the most widely used method for digital copyright protection.A watermark embedding technology was proposed for generating text using generative causal language models.An in-process watermark embedding method was adopted, which implicitly embeded text watermark during the text generation process.Compared to traditional post-process watermark embedding technology, it had less impact on the quality of generated text and had advantages such as low perception, transparency, and robustness.The proposed method has low coupling with existing models and can eliminate the need to adjust the original model structure, training strategies, deployment methods, and increase the computational cost of the original generation process.Through experimental results, the proposed watermark embedding strategy has good robustness and can effectively detect text embedded watermarks even after a certain degree of editing by users.

Key words: AIGC, generated causal language model, digital watermark, digital copyright

中图分类号: 

No Suggested Reading articles found!