网络与信息安全学报 ›› 2022, Vol. 8 ›› Issue (3): 53-65.doi: 10.11959/j.issn.2096-109x.2022039

• 专栏:多媒体内容安全 • 上一篇    下一篇

安全性可控的生成式文本隐写算法

梅佳蒙1,2, 任延珍1,2, 王丽娜1,2   

  1. 1 空天信息安全与可信计算教育部重点实验室,湖北 武汉 430072
    2 武汉大学国家网络安全学院,湖北 武汉 430072
  • 修回日期:2022-04-19 出版日期:2022-06-15 发布日期:2022-06-01
  • 作者简介:梅佳蒙(1997− ),男,湖北宜昌人,武汉大学硕士生,主要研究方向为信息隐藏
    任延珍(1973− ),女,陕西延安人,博士,武汉大学教授、博士生导师,主要研究方向为多媒体内容安全、AI交互安全、多媒体信息隐藏和隐写分析
    王丽娜(1964− ),女,辽宁沈阳人,博士,武汉大学教授、博士生导师,主要研究方向为多媒体安全、云计算安全和网络安全
  • 基金资助:
    国家自然科学基金(61872275);国家自然科学基金(62172306);湖北省重点研发计划(2021BAA034);湖北省重点研发计划(2020BAB018)

Generation-based linguistic steganography with controllable security

Jiameng MEI1,2, Yanzhen REN1,2, Lina WANG1,2   

  1. 1 Key Laboratory of Aerospace Information Security and Trusted Computing, Ministry of Education, Wuhan 430072, China
    2 School of Cyber Science and Engineering, Wuhan University, Wuhan 430072, China
  • Revised:2022-04-19 Online:2022-06-15 Published:2022-06-01
  • Supported by:
    The National Nature Science Foundation of China(61872275);The National Nature Science Foundation of China(62172306);The Key R&D Program of Hubei Province(2021BAA034);The Key R&D Program of Hubei Province(2020BAB018)

摘要:

生成式文本隐写算法通过对候选池中的单词进行控制性选择映射来隐藏秘密信息,通常包含3个模块:文本生成模型、候选池概率分布截断和隐写嵌入算法。由于不同时刻文本生成模型输出的概率分布差异巨大,现有算法通常采用top-k或top-p对候选池单词的概率分布进行截断,以减少低概率的生成词,提高生成文本的安全性。当文本生成模型输出的候选池概率分布过于集中(over-concentrated)或过于平坦(over-flat)时,原有的top-k或top-p截断方式不足以应对概率分布的变化,容易产生概率较低的词或忽略概率较高的词,导致生成文本的安全性指标出现异常。针对此类问题,提出了安全性可控的生成式文本隐写算法,在候选池中根据秘密信息控制性选择生成词时,所提算法基于困惑度和KL散度的参数约束,动态进行候选池概率分布的截断,使候选池中所有单词都满足参数约束,提高了生成文本的安全性。实验结果表明,所提算法生成的隐写文本困惑度和KL散度可控;在相同KL散度情况下,生成文本的困惑度较现有算法下降最高达20%~30%;可以同时控制困惑度和KL散度,在指标合理的情况下,使生成的文本同时满足困惑度和KL散度两个指标。在使用3种文本隐写分析算法检测生成的隐写文本时,检测准确率均在50%左右,表现出很好的统计安全性。

关键词: 生成式文本隐写算法, 算术编码, 安全性可控, 候选池截断

Abstract:

Generation-based linguistic steganography hides secret information through controllable modification and mapping of words in the candidate pool.It usually consists of three parts: text generation model, candidate pool probability distribution truncation and steganographic embedding algorithm.Due to the huge difference in the probability distribution of the text generation model outputs at different times, existing algorithms usually use top-k or top-p methods to truncate the probability distribution of words in the candidate pool to reduce the low-probability generated words and improve the security of the generated text.When the probability distribution of the candidate pool output by the text generation model is over-concentrated or over-flat, the original top-k or top-p truncation method will be not enough to cope with the change of the probability distribution, and it is easy to generate low-probability words or ignore high-probability words.This will lead to abnormal security metrics of the generated text.To address these problems, a generation-based linguistic steganography with controllable security was proposed.When selecting generated words with controllability in the candidate pool according to secret information, the proposed algorithm was based on the parameter constraints of perplexity and KL divergence.The truncation of the candidate pool probability distribution made all words satisfy the parameter constraints, which improved the security of the generated text.Experiment results showed that the perplexity and KL divergence of the steganographic text generated by the proposed algorithm are controllable.Under the same KL divergence, the perplexity of the text generated by the proposed algorithm is reduced by up to 20%~30% compared with the existing algorithm.This algorithm could control the perplexity and KL divergence at the same time, and make the generated text satisfy both perplexity and KL divergence when the indicators are reasonable.When using the three text steganalysis algorithms to detect the generated steganographic text, the detection accuracy is about 50%, showing excellent statistical security.

Key words: generation-based linguistic steganography, arithmetic coding, controllable security, the truncation of candidate pool

中图分类号: 

No Suggested Reading articles found!