通信学报 ›› 2023, Vol. 44 ›› Issue (11): 249-259.doi: 10.11959/j.issn.1000-436x.2023203
• 学术论文 • 上一篇
窦全胜1,2, 张顺1, 潘浩1, 王荟贤1, 唐焕玲1
修回日期:
2023-10-13
出版日期:
2023-11-01
发布日期:
2023-11-01
作者简介:
窦全胜(1971− ),男,黑龙江大庆人,博士,山东工商学院教授,主要研究方向为机器学习、数据挖掘、知识工程与知识处理、智能理论与方法等基金资助:
Quansheng DOU1,2, Shun ZHANG1, Hao PAN1, Huixian WANG1, Huanling TANG1
Revised:
2023-10-13
Online:
2023-11-01
Published:
2023-11-01
Supported by:
摘要:
针对SQUARES程序空间增长过快,导致程序合成效率偏低的问题,在SQUARES的基础上,增加了以深度神经网络为核心的程序空间约简器,将给定的<被查询表,查询结果>示例表示成二维张量,作为深度神经网络的输入,网络的输出是关于目标 SQL 语句合成规则的相关性标记向量。约简器根据神经网络的输出结果,采用末N位淘汰策略,删除与目标SQL语句相关性弱的合成规则,以减少候选SQL语句的生成和验证,提升系统合成效率。对约简器中深度神经网络的结构设计、训练样本集的生成方法和网络训练过程进行了详细描述。同时将PSR-SQUARES与当前有代表性SQL逆向合成系统进行实验对比,实验结果表明,PSR-SQUARES的综合性能不同程度地优于其他合成系统,平均合成时间由SQUARES的251 s降低至130 s,目标程序合成成功率由80%提升至89%。
中图分类号:
窦全胜, 张顺, 潘浩, 王荟贤, 唐焕玲. PSR-SQUARES:基于程序空间约简器的SQL逆向合成系统[J]. 通信学报, 2023, 44(11): 249-259.
Quansheng DOU, Shun ZHANG, Hao PAN, Huixian WANG, Huanling TANG. PSR-SQUARES: SQL reverse synthesis system based on program space reducer[J]. Journal on Communications, 2023, 44(11): 249-259.
表1
DSL的上下文无关文法"
产生式左部 | 产生式右部 |
table→ | input(0)|inner_join(table,table)(1)| |
inner_join3(table,table,table)(2)| | |
inner_join4(table,table,table,table)(3)| | |
filter(table,filterCondition)(4)| | |
filters(table,filterCondition,filterCondition,op)(5)| | |
summariseGrouped(table,summmariseCondition,Cols)(6)| | |
anti_join(table,table)(7)|left_join(table,table)(8)| | |
bind_rows(table,table)(9)|intersect(table,selectCols,distinct)(10) | |
tableSelect→ | select(table,selectCols,distinct)(11) |
op→ | Or(12)|And(13) |
distinct→ | true(14)|false(15) |
empty→ | ε(16) |
表2
PSR-SQUARES-4合成的DSL程序示例"
序号 | 样本编号 | 枚举轮数/轮 | DSL程序 |
L1=summariseGrouped(input0, meanage = | |||
1 | 8 | 2 | mean(age), level) |
P=select(L1, level,meanage, true) | |||
L1=summariseGrouped(input0, maxcost | |||
=max(cost),P_id) | |||
2 | 18 | 4 | L2=inner_join3(input0, L1, input1) |
L3=filter(L2,cost==maxcost) | |||
P=select(L3,P_id,S_name,distinct,true) | |||
L1=summariseGrouped(input2, | |||
S_key,S_name) | |||
L2=filters(L1, S_key==S4, S_key!=S4) | |||
3 | 32 | 6 | L3=filters(input0,S_key==S4, S_key!=S4) |
L4=inner_join4(L3, input1,L2, input0) | |||
L5=filter(L4,color ==green) | |||
P=select(L5, P_id,n, true) |
表3
SQUARES与PSR-SQUARES-4在全部测试样本的合成时间对比"
序号 | 样本编号 | 枚举轮数 | SQUARES的合成时间/s | PSR-SQUARES-4的合成时间/s |
1 | 7 | 1 | 3.2 | 3.1 |
2 | 24 | 4 | 3.3 | 3.3 |
3 | 1 | 2 | 3.7 | 2.3 |
4 | 12 | 2 | 3.7 | 3.6 |
5 | 20 | 5 | 3.4 | 3.3 |
6 | 21 | 2 | 3.4 | 3.9 |
7 | 13 | 1 | 3.4 | 2.5 |
8 | 10 | 3 | 3.9 | 3.4 |
9 | 37 | 4 | 3.9 | 3.6 |
10 | 19 | 5 | 4.1 | 3.3 |
11 | 44 | 5 | 4.1 | 5 |
12 | 8 | 2 | 4.7 | 3.3 |
13 | 36 | 5 | 5.2 | 4.2 |
14 | 14 | 3 | 5.5 | 4 |
15 | 39 | 4 | 7.2 | 4 |
16 | 50 | 5 | 7.6 | 4.8 |
17 | 23 | 5 | 8.2 | 3.6 |
18 | 28 | 5 | 8.4 | 4.8 |
19 | 31 | 5 | 8.5 | 6 |
20 | 11 | 1 | 8.7 | 3.4 |
21 | 32 | 6 | 11 | 4.4 |
22 | 5 | 3 | 14 | 3.8 |
23 | 30 | 6 | 14 | 10 |
24 | 2 | 3 | 23 | 9 |
25 | 29 | 6 | 33 | 5 |
26 | 48 | 4 | 40 | 3.3 |
27 | 4 | 3 | 40 | 4.3 |
28 | 41 | 5 | 43 | 8 |
29 | 55 | 4 | 69 | 4.4 |
30 | 26 | 5 | 74 | 39 |
31 | 45 | 4 | 80 | 3.4 |
32 | 3 | 3 | 82 | 9.7 |
33 | 54 | 4 | 89 | 12 |
34 | 40 | 5 | 93 | 5.2 |
35 | 42 | 5 | 103 | 3.9 |
36 | 52 | 4 | 128 | 14 |
37 | 6 | 3 | 128 | 17 |
38 | 15 | 3 | 168 | 6.5 |
39 | 53 | 4 | 438 | 29 |
40 | 49 | 4 | 920 | 14 |
41 | 51 | 4 | 1452 | 21 |
42 | 17 | 4 | 1898 | 11.8 |
43 | 18 | 4 | 1952 | 38 |
44 | 27 | 4 | 3065 | 309 |
45 | 35 | 5 | — | 3.9 |
46 | 47 | 4 | — | 256 |
47 | 33 | 6 | — | 549 |
48 | 9 | 6 | — | 2271 |
49 | 43 | 4 | — | 2658 |
50 | 46 | 4 | — | — |
51 | 38 | 5 | — | — |
52 | 22 | 5 | — | — |
53 | 16 | 6 | — | — |
54 | 25 | 6 | — | — |
55 | 34 | 6 | — | — |
[1] | PARISOTTO E , MOHAMED A R , SINGH R ,et al. Neuro-symbolic program synthesis[J]. arXiv Preprint,arXiv:1611.01855, 2016. |
[2] | SOLAR-LEZAMA A . Program synthesis by sketching[D]. Berkeley:University of California, 2008. |
[3] | CHEN X , LIU C , SHIN R ,et al. Latent attention for if-then program synthesis[J]. arXiv Preprint,arXiv:1611.01867, 2016. |
[4] | GU X D , ZHANG H Y , KIM S . Deep code search[C]// Proceedings of the 40th International Conference on Software Engineering. New York:ACM Press, 2018: 933-944. |
[5] | LI C T , TARLOW D , GAUNT A L ,et al. Neural program lattices[C]// Proceedings of the International Conference on Learning Representations (ICLR).[S.l.]: OpenReview, 2017: 1-17. |
[6] | DEVLIN J , UESATO J , BHUPATIRAJU S ,et al. RobustFill:neural program learning under noisy I/O[C]// Proceedings of the 34th International Conference on Machine Learning. New York:ACM Press, 2017: 990-998. |
[7] | GULWANI S , POLOZOV O , SINGH R . Program synthesis[J]. Foundations and Trends in Programming Languages, 2017,4(1/2): 1-119. |
[8] | DAS-SARMA A , PARAMESWARAN A , GARCIA-MOLINA H ,et al. Synthesizing view definitions from data[C]// Proceedings of the 13th International Conference on Database Theory. New York:ACM Press, 2010: 89-103. |
[9] | TRAN Q T , CHAN C Y , PARTHASARATHY S . Query by output[C]// Proceedings of the 2009 ACM SIGMOD International Conference on Management of data. New York:ACM Press, 2009: 535-548. |
[10] | TRAN Q T , CHAN C Y , PARTHASARATHY S . Query reverse engineering[J]. The VLDB Journal, 2014,23(5): 721-746. |
[11] | LI H , CHAN C Y , MAIER D . Query from examples[J]. Proceedings of the VLDB Endowment, 2015,8(13): 2158-2169. |
[12] | ZHANG S , SUN Y Y . Automatically synthesizing SQL queries from input-output examples[C]// Proceedings of 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE). Piscataway:IEEE Press, 2014: 224-234. |
[13] | WANG C L , CHEUNG A , BODIK R . Synthesizing highly expressive SQL queries from input-output examples[C]// Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation. New York:ACM Press, 2017: 452-466. |
[14] | WANG C L , CHEUNG A , BODIK R . Interactive query synthesis from input-output examples[C]// Proceedings of the 2017 ACM International Conference on Management of Data. New York:ACM Press, 2017: 1631-1634. |
[15] | TAN W C , ZHANG M H , ELMELEEGY H ,et al. Reverse engineering aggregation queries[J]. Proceedings of the VLDB Endowment, 2017,10(11): 1394-1405. |
[16] | TAN W C , ZHANG M H , ELMELEEGY H ,et al. REGAL+[J]. Proceedings of the VLDB Endowment, 2018,11(12): 1982-1985. |
[17] | KHURANA K , HARITSA J R . UNMASQUE:a hidden SQL query extractor[J]. Proceedings of the VLDB Endowment, 2020,13(12): 2809-2812. |
[18] | REHMAN M S , HUANG S L , ELMORE A J . A demonstration of RELIC[J]. Proceedings of the VLDB Endowment, 2021,14(12): 2795-2798. |
[19] | QIN X , CHAI C , LUO Y ,et al. Interactively discovering and ranking desired tuples by data exploration[J]. The VLDB Journal, 2022,31(4): 753-777. |
[20] | MARTINS R , CHEN J , CHEN Y J ,et al. Trinity:an extensible synthesis framework for data science[J]. Proceedings of the VLDB Endowment, 2019,12(12): 1914-1917. |
[21] | ORVALHO P , TERRA-NEVES M , VENTURA M ,et al. SQUARES:a SQL synthesizer using query reverse engineering[J]. Proceedings of the VLDB Endowment, 2020,13(12): 2853-2856. |
[22] | BAIK C , JIN Z J , CAFARELLA M ,et al. Duoquest:a dual-specification system for expressive SQL queries[C]// Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. New York:ACM Press, 2020: 2319-2329. |
[23] | TAKENOUCHI K , ISHIO T , OKADA J ,et al. PATSQL:efficient synthesis of SQL queries from example tables with quick inference of projected columns[J]. arXiv Preprint,arXiv:2010.05807, 2020. |
[24] | ZHOU X Y , BODIK R , CHEUNG A ,et al. Synthesizing analytical SQL queries from computation demonstration[C]// Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation. New York:ACM Press, 2022: 168-182. |
[25] | BALOG M , GAUNT A L , BROCKSCHMIDT M ,et al. DEEPCODER:learning to write programs[J]. arXiv Preprint,arXiv:1611.01989, 2016. |
[26] | 顾斌, 于波, 董晓刚 ,等. 程序智能合成技术研究进展[J]. 软件学报, 2021,32(5): 1373-1384. |
GU B , YU B , DONG X G ,et al. Intelligent program synthesis techniques:literature review[J]. Journal of Software, 2021,32(5): 1373-1384. | |
[27] | 袁胜浩, 杨志斌, 张博林 ,等. 同步语言多线程代码生成的语义保持证明方法[J]. 计算机学报, 2020,43(11): 2216-2226. |
YUAN S H , YANG Z B , ZHANG B L ,et al. A semantic preservation proving method of multi-threaded code generation for synchronous language[J]. Chinese Journal of Computers, 2020,43(11): 2216-2226. | |
[28] | REED S , DE-FREITAS N . Neural programmer-interpreters[C]// Proceedings of the 4th International Conference on Learning Representations (ICLR).[S.l.]: OpenReview, 2016: 1-13. |
[29] | GUO Z , JAMES M , JUSTO D ,et al. Program synthesis by type-guided abstraction refinement[J]. Proceedings of the ACM on Programming Languages, 2019,4: 1-28. |
[30] | LE H , WANG Y , GOTMARE A D ,et al. CodeRL:mastering code generation through pretrained models and deep reinforcement learning[J]. arXiv Preprint,arXiv:2207.01780, 2022. |
[31] | GULWANI S , MARRON M . NLyze:interactive programming by natural language for spreadsheet data analysis and manipulation[C]// Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data. New York:ACM Press, 2014: 803-814. |
[32] | SHRIVASTAVA D , LAROCHELLE H , TARLOW D . Learning to combine per-example solutions for neural program synthesis[J]. arXiv Preprint,arXiv:2106.07175, 2021. |
[33] | HUANG G , LIU Z , MAATEN L V D ,et al. Densely connected convolutional networks[C]// Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway:IEEE Press, 2017: 2261-2269. |
[1] | 艾浩军, 曾维珂, 陶荆杰, 徐锦盈, 常含笑. 基于扩散模型的室内定位射频指纹数据增强方法[J]. 通信学报, 2023, 44(11): 201-212. |
[2] | 马云霄, 吴忠辉, 徐祖云, 衷璐洁, 许长桥. 算力网络中面向计算重用的任务调度优化[J]. 通信学报, 2023, 44(11): 129-142. |
[3] | 顾晓丹, 吴文甲, 凌振. 用户密集环境下基于边缘智能的直播视频传输优化机制[J]. 通信学报, 2023, 44(11): 55-66. |
[4] | 鲁斌, 孙洋, 杨振宇. 基于原始点云网格自注意力机制的三维目标检测方法[J]. 通信学报, 2023, 44(10): 72-84. |
[5] | 荆有波, 曹清越, 朱瑞. MgdFlow:微电网场景下的多粒度数据流管理算法[J]. 通信学报, 2023, 44(10): 94-102. |
[6] | 谢刚, 王荃毅, 谢新林, 王健安. 融合多尺度深度卷积的轻量级Transformer交通场景语义分割算法[J]. 通信学报, 2023, 44(10): 213-225. |
[7] | 郭璠, 李小虎, 刘文韬, 唐琎. 基于参数回归的快速全景图像拼接算法[J]. 通信学报, 2023, 44(9): 36-47. |
[8] | 李梓杨, 陈鹏程, 于炯, 蒲勇霖, 何贞贞, 李雪, 郑世杰. 面向大规模图数据的关键词覆盖最优路径规划方法[J]. 通信学报, 2023, 44(9): 205-217. |
[9] | 刘亚群, 谢钧, 邢长友, 倪保安. 飞行自组网拓扑控制研究综述[J]. 通信学报, 0, (): 195-214. |
[10] | 杨静, 吴成茂, 周流平. 基于全局-局部自注意力网络的视频异常检测方法[J]. 通信学报, 2023, 44(8): 241-250. |
[11] | 张海波, 王大斌, 王汝言, 王冬宇. 车联网中基于模糊评价密度聚类的共谋节点检测方法[J]. 通信学报, 2023, 44(7): 114-123. |
[12] | 薛一鸣, 张金雨, 陈波涛, 杨正洪. 基于参数寻优的立体声鲁棒水印算法[J]. 通信学报, 2023, 44(7): 149-158. |
[13] | 李莉, 王小龙, 张之欣, 时榕良, 郭旭. 新型电力系统分布式家庭光伏采集场景下的信任评估模型[J]. 通信学报, 2023, 44(7): 197-206. |
[14] | 张倩倩, 张祎, 李浩, 马媛媛, 罗向阳. 基于特征选择和图卷积表示的JPEG图像隐写者识别[J]. 通信学报, 2023, 44(7): 218-229. |
[15] | 李竟博, 马礼, 李阳, 傅颖勋, 马东超. 感传算协同工业互联网优化设计[J]. 通信学报, 2023, 44(6): 12-22. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||
|