Chinese Journal of Network and Information Security ›› 2022, Vol. 8 ›› Issue (3): 53-65.doi: 10.11959/j.issn.2096-109x.2022039

• Topic: Multimedia Content Security • Previous Articles     Next Articles

Generation-based linguistic steganography with controllable security

Jiameng MEI1,2, Yanzhen REN1,2, Lina WANG1,2   

  1. 1 Key Laboratory of Aerospace Information Security and Trusted Computing, Ministry of Education, Wuhan 430072, China
    2 School of Cyber Science and Engineering, Wuhan University, Wuhan 430072, China
  • Revised:2022-04-19 Online:2022-06-15 Published:2022-06-01
  • Supported by:
    The National Nature Science Foundation of China(61872275);The National Nature Science Foundation of China(62172306);The Key R&D Program of Hubei Province(2021BAA034);The Key R&D Program of Hubei Province(2020BAB018)

Abstract:

Generation-based linguistic steganography hides secret information through controllable modification and mapping of words in the candidate pool.It usually consists of three parts: text generation model, candidate pool probability distribution truncation and steganographic embedding algorithm.Due to the huge difference in the probability distribution of the text generation model outputs at different times, existing algorithms usually use top-k or top-p methods to truncate the probability distribution of words in the candidate pool to reduce the low-probability generated words and improve the security of the generated text.When the probability distribution of the candidate pool output by the text generation model is over-concentrated or over-flat, the original top-k or top-p truncation method will be not enough to cope with the change of the probability distribution, and it is easy to generate low-probability words or ignore high-probability words.This will lead to abnormal security metrics of the generated text.To address these problems, a generation-based linguistic steganography with controllable security was proposed.When selecting generated words with controllability in the candidate pool according to secret information, the proposed algorithm was based on the parameter constraints of perplexity and KL divergence.The truncation of the candidate pool probability distribution made all words satisfy the parameter constraints, which improved the security of the generated text.Experiment results showed that the perplexity and KL divergence of the steganographic text generated by the proposed algorithm are controllable.Under the same KL divergence, the perplexity of the text generated by the proposed algorithm is reduced by up to 20%~30% compared with the existing algorithm.This algorithm could control the perplexity and KL divergence at the same time, and make the generated text satisfy both perplexity and KL divergence when the indicators are reasonable.When using the three text steganalysis algorithms to detect the generated steganographic text, the detection accuracy is about 50%, showing excellent statistical security.

Key words: generation-based linguistic steganography, arithmetic coding, controllable security, the truncation of candidate pool

CLC Number: 

No Suggested Reading articles found!