自动向量化：近期进展与展望

doi:10.11959/j.issn.1000-436x.2022051

通信学报 ›› 2022, Vol. 43 ›› Issue (3): 180-195.doi: 10.11959/j.issn.1000-436x.2022051

自动向量化：近期进展与展望

冯竞舸¹^,², 贺也平¹^,²^,³, 陶秋铭¹^,²

¹ 中国科学院软件研究所基础软件国家工程研究中心，北京 100190
² 中国科学院大学研究生院，北京 100049
³ 中国科学院软件研究所计算机科学国家重点实验室，北京 100090

修回日期:2022-02-09 出版日期:2022-03-25 发布日期:2022-03-01
作者简介:冯竞舸（1988- ）男，满族，河北临城人，中国科学院大学博士生，主要研究方向为编译技术及性能优化技术
贺也平（1962- ），男，甘肃兰州人，博士，中国科学院软件研究所研究员、博士生导师，主要研究方向为基础软件、系统安全
陶秋铭（1979- ），男，江苏南通人，博士，中国科学院软件研究所副研究员、硕士生导师，主要研究方向为操作系统、编译技术、软件工程
基金资助:
中国科学院战略性先导科技专项基金资助项目(XDA-Y01-01);中国科学院战略性先导科技专项基金资助项目(XDC02010600)

Auto-vectorization: recent development and prospect

Jingge FENG¹^,², Yeping HE¹^,²^,³, Qiuming TAO¹^,²

¹ National Engineering Research Center for Fundamental Software, Institute of Software Chinese Academy of Sciences, Beijing 100190, China
² Graduate University, University of Chinese Academy of Sciences, Beijing 100049, China
³ China State Key Laboratory of Computer Science, Institute of Software Chinese Academy of Sciences, Beijing 100090, China

Revised:2022-02-09 Online:2022-03-25 Published:2022-03-01
Supported by:
The Strategic Priority Research Program of Chinese Academy of Sciences(XDA-Y01-01);The Strategic Priority Research Program of Chinese Academy of Sciences(XDC02010600)

摘要/Abstract

摘要：

随着单指令流多数据流（SIMD）技术的迅速发展，近年来许多面向 SIMD 扩展部件的自动向量化编译方法被提出，有效缓解了程序员手写向量程序的压力，并发挥了SIMD扩展部件的加速效能。基于此，分析总结了自动向量化领域近 10 年的研究成果，从保义分析和变换、向量化分组分析和变换、面向处理器支持特性的分析和变换以及性能评估分析这4个方面分类归纳了自动向量化的关键问题和主要突破，进而对4个方面的发展趋势和研究方向进行了展望。

关键词: 自动向量化, SIMD扩展, 编译技术, 数据级并行, 性能优化

Abstract:

The technology of SIMD is developing rapidly, and quite a few auto-vectorization methods have been proposed.Auto-vectorization can automatically translate scalar programs into vector programs based on SIMD extension, decrease workload of the programmers in coding vector programs, and effectively improve performance of programs.Based on that, the research achievements in the field of automatic vectorization in recent 10 years were analyzed and summarized.The key problems and major breakthroughs in automatic vectorization were classified from four aspects:semantic-maintaining analysis and transformation, vectorization grouping analysis and transformation, processor-oriented analysis and transformation, and performance evaluation analysis.Furtherly, the development trends and research directions of the four aspects were prospected.

Key words: auto-vectorization, SIMD extension, compiling technology, data level parallelism, performance optimization

中图分类号:

TP312

冯竞舸, 贺也平, 陶秋铭. 自动向量化：近期进展与展望[J]. 通信学报, 2022, 43(3): 180-195.

Jingge FENG, Yeping HE, Qiuming TAO. Auto-vectorization: recent development and prospect[J]. Journal on Communications, 2022, 43(3): 180-195.

图/表 13

图1

图2

图3

图4

图5

图6

图7

图8

图9

表1

表2

图10

图11

参考文献 101

[1]	杨毅宇, 周威, 赵尚儒 ,等. 物联网安全研究综述：威胁、检测与防御[J]. 通信学报, 2021,42(8): 188-205.
	YANG Y Y , ZHOU W , ZHAO S R ,et al. Survey of IoT security research:threats,detection and defense[J]. Journal on Communications, 2021,42(8): 188-205.
[2]	高伟, 赵荣彩, 韩林 ,等. SIMD自动向量化编译优化概述[J]. 软件学报, 2015,26(6): 1265-1284.
	GAO W , ZHAO R C , HAN L ,et al. Research on SIMD auto-vectorization compiling optimization[J]. Journal of Software, 2015,26(6): 1265-1284.
[3]	ZHENG R H , PAI S . Efficient execution of graph algorithms on CPU with SIMD extensions[C]// Proceedings of 2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). Piscataway:IEEE Press, 2021: 262-276.
[4]	B?HM C , PLANT C . Massively parallel graph drawing and representation learning[C]// Proceedings of 2020 IEEE International Conference on Big Data (Big Data). Piscataway:IEEE Press, 2020: 609-616.
[5]	YAMAZAKI S . Future possibilities and effectiveness of JIT from elixir code of image processing and machine learning into native code with SIMD instructions[R]. 2021.
[6]	BIAN H D , HUANG J Q , LIU L B ,et al. ALBUS:a method for efficiently processing SpMV using SIMD and load balancing[J]. Future Generation Computer Systems, 2021,116: 371-392.
[7]	PAPAPHILIPPOU P , PAUL H J K , LUK W . Simodense:a RISC-V softcore optimised for exploring custom SIMD instructions[C]// Proceedings of 2021 31st International Conference on Field-Programmable Logic and Applications (FPL). Piscataway:IEEE Press, 2021: 391-397.
[8]	GAO Y , LIU Y Z , MA Y M ,et al. abPOA:an SIMD-based C library for fast partial order alignment using adaptive band[J]. Bioinformatics, 2020,37(15): 2209-2211.
[9]	BARREDO A , CEBRIAN J M , MORETó M ,et al. Improving predication efficiency through compaction/restoration of SIMD instructions[C]// Proceedings of 2020 IEEE International Symposium on High Performance Computer Architecture. Piscataway:IEEE Press, 2020: 717-728.
[10]	LI H J , HAN J , HAN D S . Leveraging SIMD parallelism for accelerating network applications[C]// Proceedings of APNet’20 4th Asia-Pacific Workshop on Networking. New York:ACM Press, 2020: 23-29.
[11]	MALEKI S , GAO Y Q , GARZAR′N M J ,et al. An evaluation of vectorizing compilers[C]// Proceedings of 2011 International Conference on Parallel Architectures and Compilation Techniques. Piscataway:IEEE Press, 2011: 372-382.
[12]	SISO S , ARMOUR W , THIYAGALINGAM J . Evaluating auto-vectorizing compilers through objective withdrawal of useful information[J]. ACM Transactions on Architecture and Code Optimization, 2020,16(4): 1-23.
[13]	INOUE H , . How SIMD width affects energy efficiency:a case study on sorting[C]// Proceedings of 2016 IEEE Symposium in Low-Power and High-Speed Chips. Piscataway:IEEE Press, 2016: 1-3.
[14]	AMIRI H , SHAHBAHRAMI A . SIMD programming using Intel vector extensions[J]. Journal of Parallel and Distributed Computing, 2020,135: 83-100.
[15]	STOJANOV A , TOSKOV I , ROMPF T ,et al. SIMD intrinsics on managed language runtimes[C]// Proceedings of the 2018 International Symposium on Code Generation and Optimization. 2018: 2-15.
[16]	BOGAEVSKIY D , MINENKO M , EZHOV S ,et al. Development and implementation of the H.264-codec deblocking filter based on the MIPS SIMD architecture[C]// Proceedings of 2021 IEEE Conference of Russian Young Researchers in Electrical and Electronic Engineering (ElConRus). Piscataway:IEEE Press, 2021: 246-251.
[17]	STEPHENS N , BILES S , BOETTCHER M ,et al. The ARM scalable vector extension[J]. IEEE Micro, 2017,37(2): 26-39.
[18]	KUMAR R , MARTINEZ A , GONZALEZ A . A variable vector length SIMD architecture for HW/SW co-designed processors[J]. arXiv Preprint,arXiv:2102.13410, 2021.
[19]	MITRA G , JOHNSTON B , RENDELL A P ,et al. Use of SIMD vector operations to accelerate application code performance on low-powered ARM and Intel platforms[C]// Proceedings of 2013 IEEE International Symposium on Parallel ＆ Distributed Processing,Workshops and Phd Forum. Piscataway:IEEE Press, 2013: 1107-1116.
[20]	张为华, 藏斌宇 . SIMD编译优化技术研究概述[J]. 中国计算机学会通讯, 2007,3(2): 27-36.
	ZHANG W H , ZANG B Y . A survey on SIMD vectorization technology[J]. Communications of CCF, 2007,3(2): 27-36.
[21]	PORPODAS V , MAGNI A , JONES T M . PSLP:padded SLP automatic vectorization[C]// Proceedings of 2015 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). Piscataway:IEEE Press, 2015: 190-201.
[22]	CHEN Y S , MENDIS C , CARBIN M ,et al. VeGen:a vectorizer generator for SIMD and beyond[C]// Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems. New York:ACM Press, 2021: 902-914.
[23]	HAJ-ALI A , AHMED N K , WILLKE T ,et al. NeuroVectorizer:end-to-end vectorization with deep reinforcement learning[C]// Proceedings of the 18th ACM/IEEE International Symposium on Code Generation and Optimization. New York:ACM Press, 2020: 242-255.
[24]	MOLL S , HACK S . Partial control-flow linearization[C]// Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation. New York:ACM Press, 2018: 543-556.
[25]	PLOTKIN G D . Call-by-name,call-by-value and the λ-calculus[J]. Theoretical Computer Science, 1975,1(2): 125-159.
[26]	EICHENBERGER A E , WU P , O’BRIEN K , . Vectorization for SIMD architectures with alignment constraints[J]. ACM SIGPLAN Notices, 2004,39(6): 82-93.
[27]	PORPODAS V , ROCHA R C O , GóES L F W , . VW-SLP:auto-vectorization with adaptive vector width[C]// Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques. New York:ACM Press, 2018: 1-15.
[28]	ALLEN R , KENNEDY K . Optimizing compilers for modern architectures:a dependence-based approach[M]. San Francisco: Morgan Kaufmann Publishers Inc, 2001.
[29]	PSARRIS K , KLAPPHOLZ D , KONG X Y . On the accuracy of the Banerjee test[J]. Journal of Parallel and Distributed Computing, 1991,12(2): 152-157.
[30]	BULI? P , GUSTIN V , . D-test:an extension to Banerjee test for a fast dependence analysis in a multimedia vectorizing compiler[C]// Proceedings of the 18th International Parallel and Distributed Processing Symposium,2004. Piscataway:IEEE Press, 2004:230.
[31]	JENSEN N B , KARLSSON S . Improving loop dependence analysis[J]. ACM Transactions on Architecture and Code Optimization, 2017,14(3): 1-24.
[32]	SAMPAIO D N , POUCHET L N , RASTELLO F . Simplification and runtime resolution of data dependence constraints for loop transformations[C]// Proceedings of the International Conference on Supercomputing. New York:ACM Press, 2017: 1-11.
[33]	赵捷, 赵荣彩 . 基于有向图可达性的 SLP 向量化识别方法[J]. 中国科学:信息科学, 2017,47(3): 310-325.
	ZHAO J , ZHAO R C . Identifying superword level parallelism with directed graph reachability[J]. Scientia Sinica (Informationis), 2017,47(3): 310-325.
[34]	SMITH J E , FAANES G , SUGUMAR R . Vector instruction set support for conditional operations[J]. ACM SIGARCH Computer Architecture News, 2000,28(2): 260-269.
[35]	HALL M , SHIN J . Compiler optimizations for architectures supporting superword-level parallelism[M]. Los Angeles: University of Southern California, 2005.
[36]	SHIN J , . Introducing control flow into vectorized code[C]// Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007). Piscataway:IEEE Press, 2007: 280-291.
[37]	孙回回, 赵荣彩, 高伟 ,等. 基于条件分类的控制流向量化[J]. 计算机科学, 2015,42(11): 240-247.
	SUN H H , ZHAO R C , GAO W ,et al. Control flow vectorization based on conditions classification[J]. Computer Science, 2015,42(11): 240-247.
[38]	SUJON M H , WHALEY R C , YI Q . Vectorization past dependent branches through speculation[C]// Proceedings of the 22nd International Conference on Parallel Architectures And Compilation Techniques. Piscataway:IEEE Press, 2013: 353-362.
[39]	BAGHSORKHI S S , VASUDEVAN N , WU Y F . FlexVec:auto-vectorization for irregular loops[J]. ACM SIGPLAN Notices, 2016,51(6): 697-710.
[40]	SUN H H , FEY F , ZHAO J ,et al. WCCV:improving the vectorization of IF-statements with warp-coherent conditions[C]// Proceedings of the ACM International Conference on Supercomputing. New York:ACM Press, 2019: 319-329.
[41]	高伟, 李颖颖, 孙回回 ,等. 一种改进的控制流 SIMD 向量化方法[J]. 软件学报, 2017,28(8): 2046-2063.
	GAO W , LI Y Y , SUN H H ,et al. Improved SIMD vectorization method in the presence of control flow[J]. Journal of Software, 2017,28(8): 2046-2063.
[42]	MAALEJ M , PAISANTE V , RAMOS P ,et al. Pointer disambiguation via strict inequalities[C]// Proceedings of 2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). Piscataway:IEEE Press, 2017: 134-147.
[43]	侯永生 . 多重循环 SIMD 向量化方法及性能优化技术研究[D]. 郑州:解放军信息工程大学, 2014.
	HOU Y S . Research on SIMD vectorization of loop nests and its optimization techniques[D]. Zhengzhou:PLA Information Engineering University, 2014.
[44]	刘鹏, 赵荣彩, 李朋远 . 一种面向向量化的动态指针别名分析框架[J]. 计算机科学, 2015,42(3): 26-30.
	LIU P , ZHAO R C , LI P Y . Dynamic pointer alias analysis framework for vectorization[J]. Computer Science, 2015,42(3): 26-30.
[45]	SUI Y L , FAN X K , ZHOU H ,et al. Loop-oriented array and field-sensitive pointer analysis for automatic SIMD vectorization[J]. ACM SIGPLAN Notices, 2016,51(5): 41-51.
[46]	高伟, 韩林, 赵荣彩 ,等. 向量并行度指导的循环SIMD向量化方法[J]. 软件学报, 2017,28(4): 925-939.
	GAO W , HAN L , ZHAO R C ,et al. Loop vectorization method guided by SIMD parallelism[J]. Journal of Software, 2017,28(4): 925-939.
[47]	LARSEN S , AMARASINGHE S . Exploiting superword level parallelism with multimedia instruction sets[C]// Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation. New York:ACM Press, 2000: 145-156.
[48]	徐金龙, 赵荣彩, 韩林 . 分段约束的超字并行向量发掘路径优化算法[J]. 计算机应用, 2015,35(4): 950-955.
	XU J L , ZHAO R C , HAN L . Vector exploring path optimization algorithm of superword level parallelism with subsection constraints[J]. Journal of Computer Applications, 2015,35(4): 950-955.
[49]	PORPODAS V , JONES T M . Throttling automatic vectorization:when less is more[C]// Proceedings of 2015 International Conference on Parallel Architecture and Compilation (PACT). Piscataway:IEEE Press, 2015: 432-444.
[50]	PORPODAS V , ROCHA R C O , GóES L F W , . Look-ahead SLP:auto-vectorization in the presence of commutative operations[C]// Proceedings of the 2018 International Symposium on Code Generation and Optimization. New York:ACM Press, 2018: 163-174.
[51]	PORPODAS V , RATNALIKAR P . PostSLP:cross-region vectorization of fully or partially vectorized code[C]// Languages and Compilers for Parallel Computing. Berlin:Springer, 2021: 15-31.
[52]	PORPODAS V , . SuperGraph-SLP auto-vectorization[C]// Proceedings of 2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT). Piscataway:IEEE Press, 2017: 330-342.
[53]	MENDIS C , JAIN A , JAIN P ,et al. Revec:program rejuvenation through revectorization[C]// Proceedings of the 28th International Conference on Compiler Construction. 2019: 29-41.
[54]	BARIK R , ZHAO J S , SARKAR V . Efficient selection of vector instructions using dynamic programming[C]// Proceedings of 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture. Piscataway:IEEE Press, 2010: 201-212.
[55]	LIU J , ZHANG Y R , JANG O ,et al. A compiler framework for extracting superword level parallelism[J]. ACM SIGPLAN Notices, 2012,47(6): 347-358.
[56]	HUH J , TUCK J . Improving the effectiveness of searching for isomorphic chains in superword level parallelism[C]// Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture. Piscataway:IEEE Press, 2017: 718-729.
[57]	MENDIS C , AMARASINGHE S . goSLP:globally optimized superword level parallelism framework[C]// Proceedings of the ACM on Programming Languages. New York:ACM Press, 2018: 1-28.
[58]	MENDIS C , YANG C , PU Y ,et al. Compiler auto-vectorization with imitation learning[C]// Proceedings of the 33rd Conference on Neural Information Processing Systems (NeurIPS 2019). Cambridge:MIT Press, 2019:32.
[59]	ALLEN J R , KENNEDY K , PORTERFIELD C ,et al. Conversion of control dependence to data dependence[C]// Proceedings of the 10th ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages. New York:ACM Press, 1983: 177-189.
[60]	NUZMAN D , ZAKS A . Outer-loop vectorization:revisited for short SIMD architectures[C]// Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques. New York:ACM Press, 2008: 2-11.
[61]	魏帅, 赵荣彩, 姚远 . 面向SLP的多重循环向量化[J]. 软件学报, 2012,23(7): 1717-1728.
	WEI S , ZHAO R C , YAO Y . Loop-nest auto-vectorization based on SLP[J]. Journal of Software, 2012,23(7): 1717-1728.
[62]	ZHAO J , LI B J , NIE W ,et al. AKG:automatic kernel generation for neural processing units using polyhedral transformations[C]// Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation. New York:ACM Press, 2021: 1233-1248.
[63]	TRIFUNOVIC K , NUZMAN D , COHEN A ,et al. Polyhedral-model guided loop-nest auto-vectorization[C]// Proceedings of 2009 18th International Conference on Parallel Architectures and Compilation Techniques. Piscataway:IEEE Press, 2009: 327-337.
[64]	KONG M , VERAS R , STOCK K ,et al. When polyhedral transformations meet SIMD code generation[J]. ACM SIGPLAN Notices, 2013,48(6): 127-138.
[65]	MOREIRA R E A , COLLANGE C , QUINT?O F M , . Function call re-vectorization[J]. ACM SIGPLAN Notices, 2017,52(8): 313-326.
[66]	GNU. Using vector instructions through build-in functions[R]. 2018.
[67]	KARRENBERG R . Automatic SIMD vectorization of SSA-based control flow graphs[M]. Wiesbaden: Springer Vieweg, 2015.
[68]	REICHE O , KOBYLKO C , HANNIG F ,et al. Auto-vectorization for image processing DSLs[C]// Proceedings of the 18th ACM SIGPLAN/SIGBED Conference on Languages,Compilers,and Tools for Em bedded Systems. New York:ACM Press, 2017: 21-30.
[69]	SHIN J , HALL M , CHAME J . Superword-level parallelism in the presence of control flow[C]// Proceedings of International Symposium on Code Generation and Optimization. Piscataway:IEEE Press, 2005: 165-175.
[70]	TANAKA H , OTA Y , MATSUMOTO N ,et al. A new compilation technique for SIMD code generation across basic block boundaries[C]// Proceedings of 2010 15th Asia and South Pacific Design Automation Conference (ASP-DAC). Piscataway:IEEE Press, 2010: 101-106.
[71]	LARSEN S , RABBAH R , AMARASINGHE S . Exploiting vector parallelism in software pipelined loops[C]// Proceedings of 38th Annual IEEE/ACM International Symposium on Microarchitecture. Piscataway:IEEE Press, 2005: 11-129.
[72]	ROCHA R C O , PORPODAS V , PETOUMENOS P ,et al. Vectorization-aware loop unrolling with seed forwarding[C]// Proceedings of the 29th International Conference on Compiler Construction. New York:ACM Press, 2020: 1-13.
[73]	ZHOU H , XUE J L . Exploiting mixed SIMD parallelism by reducing data reorganization overhead[C]// Proceedings of 2016 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). Piscataway:IEEE Press, 2016: 59-69.
[74]	YAZDANPANAH F . An approach for analyzing auto-vectorization potential of emerging workloads[J]. Microprocessors and Microsystems, 2017,49: 139-149.
[75]	RODRIGUEZ-CANCIO M , COMBEMALE B , BAUDRY B . Automatic microbenchmark generation to prevent dead code elimination and constant folding[C]// Proceedings of 2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE). Piscataway:IEEE Press, 2016: 132-143.
[76]	PORPODAS V , ROCHA R C O , BREVNOV E ,et al. Super-node SLP:optimized vectorization for code sequences containing operators and their inverse elements[C]// Proceedings of 2019 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). Piscataway:IEEE Press, 2019: 206-216.
[77]	SUN H H , ZHAO R C , GAO W ,et al. Exploiting pure superword level parallelism for array indirections[C]// Proceedings of 2015 Seventh International Symposium on Parallel Architectures,Algorithms and Programming (PAAP). Piscataway:IEEE Press, 2015: 13-19.
[78]	ASHFAQ M , HUANG R B , OMARI M . FSCS-SIMD:an efficient implementation of fixed-size-candidate-set adaptive random testing using SIMD instructions[C]// Proceedings of 2020 IEEE 31st International Symposium on Software Reliability Engineering. Piscataway:IEEE Press, 2020: 277-288.
[79]	姚远 . SIMD自动向量识别及代码调优技术研究[D]. 郑州:解放军信息工程大学, 2012.
	YAO Y . Research on automatic SIMD vectorization recognization and code tuning technology[D]. Zhengzhou:PLA Information Engineering University, 2012.
[80]	NUZMAN D , ROSEN I , ZAKS A . Auto-vectorization of interleaved data for SIMD[J]. ACM SIGPLAN Notices, 2006,41(6): 132-143.
[81]	ANDERSON A , MALIK A , GREGG D . Automatic vectorization of interleaved data revisited[J]. ACM Transactions on Architecture and Code Op timization, 2016,12(4): 50.
[82]	李玉祥, 施慧, 陈莉 . 面向向量化的局部数据重组[J]. 小型微型计算机系统, 2009,30(8): 1528-1534.
	LI Y X , SHI H , CHEN L . Vectorization-oriented local data regrouping[J]. Journal of Chinese Computer Systems, 2009,30(8): 1528-1534.
[83]	LI P Y , ZHANG Q H , ZHAO R C ,et al. Data layout transformation for structure vectorization on SIMD architectures[C]// Proceedings of 2015 IEEE/ACIS 16th International Conference on Software Engineering,Artificial Intelligence,Networking and Parallel/Distributed Computing (SNPD). Piscataway:IEEE Press, 2015: 1-7.
[84]	LI P Y , ZHANG Q H , ZHAO R C ,et al. Data layout transformation for structure vectorization on SIMD architectures[C]// Proceedings of 2015 IEEE/ACIS 16th International Conference on Software Engineering,Artificial Intelligence,Networking and Parallel/Distributed Computing (SNPD). Piscataway:IEEE Press, 2015: 1-7.
[85]	于海宁, 韩林, 李鹏远 . 面向自动向量化的结构体优化[J]. 计算机科学, 2016,43(2): 210-215.
	YU H N , HAN L , LI P Y . Structure optimization for automatic vectorization[J]. Computer Science, 2016,43(2): 210-215.
[86]	WANG Q , HAN L , YAO J Y ,et al. Research on vectorization technology for irregular data access[C]// Communications in Computer and Information Science. Singapore:Springer Singapore, 2017: 321-334.
[87]	KIM S , HAN H . Efficient SIMD code generation for irregular kernels[J]. ACM SIGPLAN Notices, 2012,47(8): 55-64.
[88]	姚金阳, 赵荣彩, 王琦 ,等. 面向间接数组索引的向量化方法[J]. 计算机科学, 2018,45(9): 220-223,236.
	YAO J Y , ZHAO R C , WANG Q ,et al. Vectorization methods for indirect array index[J]. Computer Science, 2018,45(9): 220-223,236.
[89]	CHEN L C , JIANG P , AGRAWAL G . Exploiting recent SIMD architectural advances for irregular applications[C]// Proceedings of the 2016 International Symposium on Code Generation and Optimization. New York:ACM Press, 2016: 47-58.
[90]	JIANG P , AGRAWAL G . Conflict-free vectorization of associative irregular applications with recent SIMD architectural advances[C]// Proceedings of the 2018 International Symposium on Code Generation and Optimization. 2018: 175-187.
[91]	PRYANISHNIKOV I , KRALL A . Pointer alignment analysis for processors with SIMD instructions[J]. Software-Practice and Experience, 2007,37: 93-113.
[92]	CIORBA F M , IWAINSKY C , BUDER P . OpenMP loop scheduling revisited:making a case for more schedules[C]// Evolving OpenMP for Evolving Architectures. Berlin:Springer, 2018: 21-36.
[93]	SHAHBAHRAMI A , JUURLINK B , VASSILIADIS S . Performance impact of misaligned accesses in SIMD extensions[C]// Proceedings of the 17th Annual Workshop on Circuits,Systems and Signal Processing (ProRISC 2006).[S.l.:s.n.], 2006: 334-342.
[94]	WU P , EICHENBERGER A E , WANG A . Efficient SIMD code generation for runtime alignment and length conversion[C]// Proceedings of International Symposium on Code Generation and Optimization. Piscataway:IEEE Press, 2005: 153-164.
[95]	CRUZ-AYOROA A J . Machine learning driven compiler tuning[D]. Fukuoka:Kyushu University, 2015.
[96]	TROUVé A , CRUZ A J , MURAKAMI K J ,et al. Guide automatic vectorization by means of machine learning:a case study of tensor contraction kernels[J]. IEICE Transactions on Information and Systems, 2016,E99.D(6): 1585-1594.
[97]	ZHOU H , XUE J L . A compiler approach for exploiting partial SIMD parallelism[J]. ACM Transactions on Architecture and Code Optimization, 2016,13(1): 1-26.
[98]	STOCK K , POUCHET L N , SADAYAPPAN P . Using machine learning to improve automatic vectorization[J]. ACM Transactions on Architecture and Code Optimization, 2012,8(4): 1-23.
[99]	POHL A , COSENZA B , JUURLINK B . Vectorization cost modeling for NEON,AVX and SVE[J]. Performance Evaluation, 2020,140/141:102106.
[100]	张媛媛, 赵荣彩, 韩林 . 基于多面体表示的向量化收益评估方法[J]. 计算机工程, 2012,38(7): 266-268,272.
	ZHANG Y Y , ZHAO R C , HAN L . Vectorization benefit evaluation method based on polyhedron representation[J]. Computer Engineering, 2012,38(7): 266-268,272.
[101]	杜丽娜, 卓力, 杨硕 ,等. 基于强化学习的移动视频流业务码率自适应算法研究进展[J]. 通信学报, 2021,42(9): 205-217.
	DU L N , ZHUO L , YANG S ,et al. Survey on reinforcement learning based adaptive bit rate algorithm for mobile video streaming services[J]. Journal on Communications, 2021,42(9): 205-217.

厂商	处理器	指令集	长度/bit
		MMX	64
	Pentium
		SSE	128
		AVX128	128
Intel
		AVX256	256
	Core
		IMCI	512
		AVX512	512
	P6	VMX	128
IBM
	BG/L	BG/L	256
DEC	Alpha	MVI	64
SGI	MIPS V	MDMX	64
Sun	SPARC v9	VIS	64
HP	PA-RISC	MAX2-2	64
Motorola	G4	AltiVec	128
	Athlon	3Dnow!	128
AMD	Jaguar	F16C	128
	Bulldozer	FMA	256
Ingenic	XBurst	MXU	128
Sony	Cell	AltiVec	128
CAS	Godson	Godson	256
NRCPC	SW26010	SW26010	256
NUDT	Matrix	Matrix	1024
	ARMv6	NEON	128
ARM	PPC970	VMX	128
	ARMv8	SVE	2 048

序号时间	指令集	处理器	长度/bit	特征
1 2015年	AVX512	Xeon Phi	512	Masked blend、Perm、Bitwise、Conflict detection、Gather、Scatter and Prefetchinstruction
2 2013年	AVX256	Haswell	256	Mul and add instruction、Gather instruction、Broadcast instruction Masked load andstore instruction
3 2008年	AVX128	Sandy Bridge	128	Data reorganization and Unaligned memory instruction
4 2006年	SSE4	Penryn/Nehalem	128	Insert、Extract、Search and String processing instruction
5 2004年	SSE3	Pentium4	128	Unaligned memory instruction，Horizontal add instruction
6 2001年	SSE2	Pentium4	128	Type conversion instruction
7 1999年	SSE	Pentium3	128	128 length of Basic logical operation，shift and compare operation
8 1996年	MMX	Pentium	64	Basic logical operation，shift and compare operation

自动向量化：近期进展与展望

Auto-vectorization: recent development and prospect

在线阅读

PDF下载

可视化

摘要/Abstract

引用本文

使用本文

图/表 13

参考文献 101

相关文章 5

Metrics

推荐阅读 0

[1]	杜文峰，吴真，赖力潜. 传输延迟感知的多路径并发差异化路径数据分配算法[J]. 通信学报, 2013, 34(4): 18-157.
[2]	杜文峰,吴真,赖力潜. 传输延迟感知的多路径并发差异化路径数据分配算法[J]. 通信学报, 2013, 34(4): 149-157.
[3]	林昭文,王鲲鹏,马严. IPv6入侵检测系统性能优化的研究与实现[J]. 通信学报, 2006, 27(11A): 68-71.
[4]	温涛,王济勇,王晓霞,邹翔. 一个面向嵌入式系统实时性能优化的抢占模型[J]. 通信学报, 2005, 26(9): 129-134.
[5]	岳光荣,李川,李少谦. 超宽带跳时PPM信号在多径环境下的误码率性能优化[J]. 通信学报, 2005, 26(10): 7-12.