通信学报 ›› 2022, Vol. 43 ›› Issue (2): 196-207.doi: 10.11959/j.issn.1000-436x.2022026

• 学术通信 • 上一篇    下一篇

后量子密码CRYSTALS-Kyber的FPGA多路并行优化实现

李斌1, 陈晓杰2, 冯峰1, 周清雷1   

  1. 1 郑州大学计算机与人工智能学院,河南 郑州 450001
    2 信息工程大学数学工程与先进计算国家重点实验室,河南 郑州 450001
  • 修回日期:2022-01-10 出版日期:2022-02-25 发布日期:2022-02-01
  • 作者简介:李斌(1986-),男,河南郑州人,博士,郑州大学讲师,主要研究方向为信息安全、可重构计算
    陈晓杰(1993-),男,河南武陟人,信息工程大学博士生,主要研究方向为信息安全、可重构计算
    冯峰(1990-),男,河南新乡人,郑州大学博士生,主要研究方向为信息安全
    周清雷(1962-),男,河南新乡人,博士,郑州大学教授,主要研究方向为信息安全、自动机理论和计算复杂性理论
  • 基金资助:
    国家重点研发计划基金资助项目(2016YFB0800100);国家重点研发计划基金资助项目(2016YFB0800101);国家自然科学基金资助项目(61572444)

FPGA multi-unit parallel optimization and implementation of post-quantum cryptography CRYSTALS-Kyber

Bin LI1, Xiaojie CHEN2, Feng FENG1, Qinglei ZHOU1   

  1. 1 School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450001, China
    2 State Key Laboratory of Mathematical Engineering and Advanced Computing, Information Engineering University, Zhengzhou 450001, China
  • Revised:2022-01-10 Online:2022-02-25 Published:2022-02-01
  • Supported by:
    The National Key Research and Development Program of China(2016YFB0800100);The National Key Research and Development Program of China(2016YFB0800101);The National Natural Science Foundation of China(61572444)

摘要:

在基于格的后量子密码中,多项式乘法运算复杂且耗时,为提高格密码在实际应用中的运算效率,提出了一种后量子密码CRYSTALS-Kyber的FPGA多路并行优化实现。首先,描述了Kyber算法的流程,分析了NTT、INTT及CWM的执行情况。其次,给出了FPGA的整体结构,采用流水线技术设计了蝶形运算单元,并以Barrett模约简和CWM调度优化,提高了计算效率。同时,放置32个蝶形运算单元并行执行,缩短了整体计算周期。最后,对多RAM通道进行了存储优化,以数据的交替存取控制和RAM资源复用,提高了访存效率。此外,采用松耦合架构,以DMA通信实现了整体运算的调度。实验结果和分析表明,所提方案可在44、49、163个时钟周期内完成NTT、INTT及CWM运算,优于其他方案,具有较高的能效比。

关键词: 后量子密码, CRYSTALS-Kyber, 现场可编程门阵列, 数论变换, 多项式乘法, 蝶形运算

Abstract:

In lattice-based post-quantum cryptography, polynomial multiplication is complicated and time-consuming.In order to improve the computational efficiency of lattice cryptography in practical applications, an FPGA multi-unit parallel optimization and implementation of post-quantum cryptography CRYSTALS-Kyber was proposed.Firstly, the flow of Kyber algorithm was described and the execution of NTT, INTT and CWM were analyzed.Secondly, the overall structure of FPGA was given, the butterfly arithmetic unit was designed by pipeline technology, and the Barrett modulus reduction and CWM scheduling optimization were used to improve the calculation efficiency.At the same time, 32 butterfly arithmetic units were executed in parallel, which shortens the overall calculation cycle.Finally, the multi-RAM channel was optimized to improve the memory access efficiency with alternate data access control and RAM resource reuse.In addition, with the loosely coupled architecture, the overall operation scheduling was realized by DMA communication.The experimental results and analysis show that the proposed scheme implemented can complete NTT, INTT and CWM operations within 44, 49, and 163 clock cycles, which is superior to other schemes and has high energy efficiency ratio.

Key words: post-quantum cryptography, CRYSTALS-Kyber, FPGA, NTT, polynomial multiplication, butterfly arithmetic

中图分类号: 

No Suggested Reading articles found!