基于上下文词预测和窗口压缩编码的数字水印方法

doi:10.11959/j.issn.1000-436x.2024033

通信学报 ›› 2024, Vol. 45 ›› Issue (2): 213-224.doi: 10.11959/j.issn.1000-436x.2024033

• 学术通信 • 上一篇

基于上下文词预测和窗口压缩编码的数字水印方法

向凌云¹^,², 黄明豪¹, 张晨凌¹, 杨春芳³

¹ 长沙理工大学计算机与通信工程学院，湖南长沙 410114
² 长沙理工大学综合交通运输大数据智能处理湖南省重点实验室，湖南长沙 410114
³ 信息工程大学河南省网络空间态势感知重点实验室，河南郑州 450001

修回日期:2023-11-19 出版日期:2024-02-01 发布日期:2024-02-01
作者简介:向凌云（1983− ），女，湖南双峰人，博士，长沙理工大学教授、硕士生导师，主要研究方向为信息安全、信息隐藏、数字水印、隐写分析和自然语言处理等
黄明豪（1999− ），男，湖南邵阳人，长沙理工大学硕士生，主要研究方向为自然语言数字水印和自然语言处理等
张晨凌（2000− ），男，湖南邵阳人，长沙理工大学硕士生，主要研究方向为自然语言处理等
杨春芳（1983− ），男，福建莆田人，博士，信息工程大学副教授、博士生导师，主要研究方向为信息隐藏、多媒体智能理解、网络安全等
基金资助:
国家自然科学基金资助项目(61972057);国家自然科学基金资助项目(61872448);湖南省自然科学基金资助项目(2022JJ30623)

Digital watermarking method based on context word prediction and window compression coding

Lingyun XIANG¹^,², Minghao HUANG¹, Chenling ZHANG¹, Chunfang YANG³

¹ School of Computer and Communication Engineering, Changsha University of Science and Technology, Changsha 410114, China
² Hunan Provincial Key Laboratory of Intelligent Processing of Big Data on Transportation, Changsha University of Science and Technology, Changsha 410114, China
³ Henan Key Laboratory of Cyberspace Situation Awareness, Information Engineering University, Zhengzhou 450001, China

Revised:2023-11-19 Online:2024-02-01 Published:2024-02-01
Supported by:
The National Natural Science Foundation of China(61972057);The National Natural Science Foundation of China(61872448);The Natural Science Foundation of Hunan Province(2022JJ30623)

摘要/Abstract

摘要：

针对已有自然语言数字水印方法可替换词数量有限以及水印提取效率低的问题，提出了一种基于上下文词预测和窗口压缩编码的数字水印方法。该方法通过神经网络语言模型自动学习原始文本中每个词的上下文语义特征，预测每个词的候选词列表，从而扩充可用于嵌入水印信息的可替换词数量。同时，考虑到不同位置的候选词的替换对句子语义的影响存在差异，该方法以由多个词组成的窗口为单位来嵌入水印信息，并通过词替换前后句子间的相似度来优化水印嵌入时候选词的选择。在此基础上，提出了一种语义无关的窗口压缩编码方法，其根据窗口中词的字符信息对窗口进行水印编码，解决了提取水印信息时对词替换位置的原始上下文的依赖。实验结果表明，所提方法在具有较高嵌入容量和文本质量的前提下，大大提高了水印的提取效率。

关键词: 数字水印, 词替换, 词预测, 水印编码

Abstract:

To address the problems of limited number of substitutable words and low watermark extraction efficiency in the existing natural language digital watermarking methods, a creative method based on context word prediction and window compression coding was proposed.Firstly, the contextual semantic features of each word in the original text were automatically learned through a neural network language model, and then the candidate word set for each word was predicted, thus the number of substitutable words that could be utilized for carrying watermark information was expanded.Meanwhile, considering the difference of the semantic impact caused by the substitutions of candidate words at different positions, the watermark information was embedded into each window containing several words, and the selection of candidate words for watermark embedding was optimized by the similarity between sentences before and after performing word substitutions.Finally, a semantic-independent window compression coding method was proposed, which encoded each window as appointed watermark information in terms of the character information of words contained in the window.So that during watermark extraction, the dependence on the original context at the position of word substitution was eliminated.The experimental results show that the proposed method greatly improves the watermark extraction efficiency with high embedding capacity and text quality.

Key words: digital watermarking, word substitution, word prediction, watermarking coding

中图分类号:

TP309

向凌云, 黄明豪, 张晨凌, 杨春芳. 基于上下文词预测和窗口压缩编码的数字水印方法[J]. 通信学报, 2024, 45(2): 213-224.

Lingyun XIANG, Minghao HUANG, Chenling ZHANG, Chunfang YANG. Digital watermarking method based on context word prediction and window compression coding[J]. Journal on Communications, 2024, 45(2): 213-224.

图/表 7

图1

表1

不同窗口大小和编码长度下的水印文本质量"

k	PPL		SS
k	d=1 bit	d=2 bit	d=1 bit	d=2 bit
4	65	80	92.09%	87.98%
5	64	79	92.99%	88.91%
6	$62$	$72$	$93 . 88 %$	$90 . 43 %$

表1

表2

不同候选词阈值和窗口大小下的水印文本质量"

α	PPL			SS
α	k= 4	k= 5	k= 6	k= 4	k= 5	k= 6
0.010	70	68	65	91.77%	92.53%	93.28%
0.015	67	66	63	92.05%	92.96%	93.33%
0.020	$65$	$64$	$62$	$92 . 09 %$	$92 . 99 %$	$93 . 88 %$

表2

表3

表4

不同候选词阈值设置方式下的实验结果"

方式	k	d=1 bit			d=2 bit
方式	k	PPL	SS	嵌入成功率	PPL	SS	嵌入成功率
	4	65	92.09%	99.40%	80	87.98%	94.74%
固定候选词阈值α=0.02	5	64	92.99%	99.82%	79	88.91%	97.34%
	6	62	$93 . 88 %$	99.97%	72	$90 . 43 %$	98.57%
	4	64	89.98%	$100 %$	75	85.01%	$100 %$
自适应候选词阈值	5	61	90.82%	$100 %$	74	86.06%	$100 %$
	6	$58$	91.80%	$100 %$	70	87.31%	$100 %$

表4

表5

总水印容量和水印文本质量对比"

方法	总水印容量	BPW	PPL
文献[14]方法	5 370	0.030 6	$45$
文献[16]方法	26 298	0.150 1	70
文献[17]方法	7 554	0.043 1	47
本文方法（k=6，d=1 bit）	22 748	0.129 9	62
本文方法（k=6，d=2 bit）	$45496$	$0 . 2598$	72

表5

表6

提取效率实验结果"

方法	提取效率/(bit·s^-1)	总耗时/s
文献[14]方法	25.57	210
文献[16]方法	232.7	113
文献[17]方法	0.27	27 903
本文方法（k=6，d=1 bit）	$947 . 83$	$24$
本文方法（k=6，d=2 bit）	$1895 . 66$	$24$

表6

参考文献 32

[1]	THONNARD O , BILGE L , KASHYAP A ,et al. Are you at risk? Profiling organizations and individuals subject to targeted attacks[C]// Proceedings of International Conference on Financial Cryptography and Data Security. Berlin:Springer, 2015: 13-31.
[2]	WAN W B , WANG J , ZHANG Y M ,et al. A comprehensive survey on robust image watermarking[J]. Neurocomputing, 2022,488: 226-247.
[3]	LUO X Y , LI Y X , CHANG H W ,et al. DVMark:a deep multiscale framework for video watermarking[J]. IEEE Transactions on Image Processing, 2023,PP(99):1.
[4]	YAMNI M , KARMOUNI H , SAYYOURI M ,et al. Efficient watermarking algorithm for digital audio/speech signal[J]. Digital Signal Processing, 2022,120:103251.
[5]	何路, 桂小林, 田丰 ,等. 自然语言水印鲁棒性分析与评估[J]. 计算机学报, 2012,35(9): 1971-1982.
	HE L , GUI X L , TIAN F ,et al. Analyzing and evaluating the robustness of natural language watermarking[J]. Chinese Journal of Computers, 2012,35(9): 1971-1982.
[6]	XIAO C , ZHANG C , ZHENG C X . FontCode:embedding information in text documents using glyph perturbation[J]. ACM Transactions on Graphics, 2018,37(2): 1-16.
[7]	QI W F , GUO W , ZHANG T ,et al. Robust authentication for paper-based text documents based on text watermarking technology[J]. Mathematical Biosciences and Engineering, 2019,16(4): 2233-2249.
[8]	YANG X , ZHANG W M , FANG H ,et al. Language universal font watermarking with multiple cross-media robustness[J]. Signal Processing, 2023,203:108791.
[9]	NOZAKI J , MURAWAKI Y . Addressing segmentation ambiguity in neural linguistic steganography[J]. arXiv Preprint,arXiv:2211.06662, 2022.
[10]	VAROL A M . LZW-CIE:a high-capacity linguistic steganography based on LZW char index encoding[J]. Neural Computing and Applications, 2022,34(21): 19117-19145.
[11]	MERAL H M , SANKUR B , ?ZSOY A S , ,et al. Natural language watermarking via morphosyntactic alterations[J]. Computer Speech ＆Language, 2009,23(1): 107-125.
[12]	WANG H , SUN X M , LIU Y L ,et al. Natural language watermarking using Chinese syntactic transformations[J]. Information Technology Journal, 2008,7(6): 904-910.
[13]	YANG T Y , WU H Z , YI B ,et al. Semantic-preserving linguistic steganography by pivot translation and semantic-aware bins coding[J]. arXiv Preprint,arXiv:2203.03795, 2022.
[14]	WINSTEIN K . Lexical steganography through adaptive modulation of the word choice hash[R]. 1999.
[15]	BOLSHAKOV I A . A method of linguistic steganography based on collocationally-verified synonymy[C]// Proceedings of International Workshop on Information Hiding. Berlin:Springer, 2004: 180-191.
[16]	UEOKA H , MURAWAKI Y , KUROHASHI S . Frustratingly easy edit-based linguistic steganography with a masked language model[C]// Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics. Stroudsburg:Association for Computational Linguistics, 2021: 5486-5492.
[17]	YANG X , ZHANG J , CHEN K ,et al. Tracing text provenance via context-aware lexical substitution[C]// Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto:AAAI Press, 2022: 11613-11621.
[18]	武睿峰, 何路, 房鼎益 . 自然语言水印隐蔽性自动评测方法[J]. 计算机应用, 2013,33(12): 3522-3526,3530.
	WU R F , HE L , FANG D Y . Automatic evaluation scheme for imperceptibility of natural language watermarking[J]. Journal of Computer Applications, 2013,33(12): 3522-3526,3530.
[19]	YANG J L , WANG J M , WANG C K ,et al. A novel scheme for watermarking natural language text[C]// Proceedings of the Third International Conference on Intelligent Information Hiding and Multimedia Signal Processing. Piscataway:IEEE Press, 2007: 481-484.
[20]	林建滨, 何路, 李天智 ,等. 一种抗攻击的中文同义词替换文本水印算法[J]. 西北大学学报(自然科学版), 2010,40(3): 433-436.
	LIN J B , HE L , LI T Z ,et al. An anti-attack watermarking based on synonym substitution algorithm for Chinese text[J]. Journal of Northwest University (Natural Science Edition), 2010,40(3): 433-436.
[21]	ZHENG X Y , WU H Z . Autoregressive linguistic steganography based on BERT and consistency coding[J]. Security and Communication Networks, 2022,2022: 1-11.
[22]	ZHENG X Y , FANG Y R , WU H Z . General framework for reversible data hiding in texts based on masked language modeling[J]. arXiv Preprint,arXiv:2206.10112, 2022.
[23]	CHANG C C . Reversible linguistic steganography with Bayesian masked language modeling[J]. IEEE Transactions on Computational Social Systems, 2023,10(2): 714-723.
[24]	杨潇, 李峰, 向凌云 . 基于矩阵编码的同义词替换隐写算法[J]. 小型微型计算机系统, 2015,36(6): 1296-1300.
	YANG X , LI F , XIANG L Y . Synonym substitution-based steganographic algorithm with matrix coding[J]. Journal of Chinese Computer Systems, 2015,36(6): 1296-1300.
[25]	XIANG L Y , WU W S , LI X ,et al. A linguistic steganography based on word indexing compression and candidate selection[J]. Multimedia Tools and Applications, 2018,77(21): 28969-28989.
[26]	YANG Z L , GUO X Q , CHEN Z M ,et al. RNN-stega:linguistic steganography based on recurrent neural networks[J]. IEEE Transactions on Information Forensics and Security, 2019,14(5): 1280-1295.
[27]	YU L , LU Y L , YAN X H ,et al. MTS-Stega:linguistic steganography based on multi-time-step[J]. Entropy, 2022,24(5): 585.
[28]	VASWANI A , SHAZEER N , PARMAR N ,et al. Attention is all you need[J]. arXiv Preprint,arXiv:1706.03762, 2017.
[29]	HILL J , SIMHA R . Automatic generation of context-based fill-in-theblank exercises using co-occurrence likelihoods and Google n-grams[C]// Proceedings of the 11th Workshop on Innovative Use of NLP for Building Educational Applications. Stroudsburg:Association for Computational Linguistics, 2016: 23-30.
[30]	FEDUS W , GOODFELLOW I , DAI A M . Maskgan:better text generation via filling in the__[J]. arXiv Preprint,arXiv:1801.07736, 2018.
[31]	ZHU W , HU Z , XING E . Text infilling[J]. arXiv Preprint,arXiv:1901.00158, 2019.
[32]	LIU Y H , OTT M , GOYAL N ,et al. RoBERTa:a robustly optimized BERT pretraining approach[J]. arXiv Preprint,arXiv:1907.11692, 2019.

基于上下文词预测和窗口压缩编码的数字水印方法

Digital watermarking method based on context word prediction and window compression coding

在线阅读

PDF下载

可视化

摘要/Abstract

引用本文

使用本文

图/表 7

参考文献 32

相关文章 15

Metrics

推荐阅读 0

k/个	窗口总数	嵌入成功率
k/个	窗口总数	d=1 bit	d=2 bit
4	22 748	99.40%	94.74%
5	18 198	99.82%	97.34%
6	15 165	99.97%	98.57%

[1]	吴德阳, 胡森, 王苗苗, 金海波, 曲长波, 唐勇. 基于区域异或和三值量化的高分辨零水印算法[J]. 通信学报, 2022, 43(2): 208-222.
[2]	印曦,黄伟庆. 基于混沌理论的彩色QR编码水印技术研究[J]. 通信学报, 2018, 39(7): 50-58.
[3]	蒋文贤,张振兴,吴晶晶. 基于可逆数字水印认证的无线传感网数据融合协议[J]. 通信学报, 2018, 39(3): 118-127.
[4]	徐锋,李佳楠,孙建国. 复合的海图安全防护技术研究[J]. 通信学报, 2016, 37(2): 174-179.
[5]	赖明珠,张立国,冯维淼,王媛媛,王勇,李守政. 基于语义特征的电子海图权限水印研究[J]. 通信学报, 2016, 37(11): 137-145.
[6]	张金利，李敏，何玉杰. 基于SIFT特征点和交比值的水印图像抗攻击算法[J]. 通信学报, 2014, 35(11): 20-180.
[7]	叶天语. 基于子块区域分割和自嵌入技术的全盲多功能图像水印算法[J]. 通信学报, 2013, 34(3): 148-156.
[8]	叶天语. 自嵌入完全盲检测顽健数字水印算法[J]. 通信学报, 2012, 33(10): 7-15.
[9]	蔣铭,马兆丰,辛宇,钮心忻,杨义先. 基于DWT和视觉加权的图像质量评价方法研究[J]. 通信学报, 2011, 32(9): 129-136.
[10]	孙建国,张国印,武俊鹏,姚爱红. 基于集对分析的数字矢量地图水印性能验证技术[J]. 通信学报, 2010, 31(9A): 239-244.
[11]	刘绪崇,罗永,王建新,汪洁. 基于第二代Bandelet变换的图像认证水印算法[J]. 通信学报, 2010, 31(12): 123-130.
[12]	孙建国,门朝光,马春光,李成名. 基于身份验证的矢量地图双图分形水印模型[J]. 通信学报, 2009, 30(9): 24-28.
[13]	刘丽,彭代渊,李晓举. 适用于广播监视的安全视频水印方案[J]. 通信学报, 2009, 30(8): 51-55.
[14]	何密,成礼智. 基于Contourlet变换和子空间投影的非对称数字水印[J]. 通信学报, 2009, 30(4): 27-34.
[15]	张秋余,刘洪国,袁占亭. 基于图像局部稳定性的LSB隐藏信息检测算法[J]. 通信学报, 2009, 30(11A): 37-43.