面向XML关键字查询的高效RKN求解策略

doi:10.3969/j.issn.1000-436x.2014.07.006

通信学报 ›› 2014, Vol. 35 ›› Issue (7): 46-55.doi: 10.3969/j.issn.1000-436x.2014.07.006

面向XML关键字查询的高效RKN求解策略

陈子阳^1,²,王璿^1,²(),汤显³

¹ 燕山大学信息科学与工程学院，河北秦皇岛 066004
² 河北省计算机虚拟技术与系统集成重点实验室，河北秦皇岛 066004
³ 燕山大学经济与管理学院，河北秦皇岛 066004

出版日期:2014-07-25 发布日期:2017-06-24
基金资助:
国家自然科学基金资助项目;国家自然科学基金资助项目;国家自然科学基金资助项目;河北省教育厅研究计划基金资助项目;河北省科学技术研究与发展计划科技支撑计划基金资助项目

Efficiently computing RKN for keyword queries on XML data

HENZi-yang C^1,²,ANGXuan W^1,²(),ANGXian T³

¹ School of Information Science and Engineering, Yanshan University, Qinhuangdao 066004, China
² Key Laboratory for Computer Virtual Technology and System Integration of Hebei Province, Yanshan University, Qinhuangdao 066004, China
³ School of Economics and Management, Yanshan University, Qinhuangdao 066004,China

Online:2014-07-25 Published:2017-06-24
Supported by:
The National Natural Science Foundation of China;The National Natural Science Foundation of China;The National Natural Science Foundation of China;The Research Funds From Education Department of Hebei Province;The Science and Technology Research and Development Program of Hebei Province

摘要/Abstract

摘要：

构建结果子树是XML关键字查询处理的核心问题，其中求解与每个子树根节点相关的关键字节点是影响结果子树构建效率的重要步骤。针对已有方法不能正确求解基于ELCA(exclusive lowest common ancestor)语义的相关关键字节点(RKN,relevant keyword node)的问题，提出RKN的形式化定义及相应的RKN-Base算法。该算法通过顺序扫描每个关键字节点一次即可正确判断其是否为某个ELCA节点的RKN。针对RKN-Base不能避免处理无用节点的问题，提出一种优化算法RKN-Optimized，该算法基于每个ELCA节点求其RKN集合，从而避免了对无用节点的处理，降低了时间复杂度。最后，通过实验验证了所提算法的高效性。

关键词: 可扩展标记语言, 子树构建, ELCA, 相关关键字节点

Abstract:

Subtree results construction is a core problem in keyword query processing over XML data,for which com-puting the set of relevant keyword nodes (RKN) for each subtree's root node will greatly affect the overall system per-formance. Considering that existing methods cannot correctly identify RKN for ELCA semantics,the definitions of RKN and the RKN-Base algorithm were proposed,which can correctly judge whether a given node is an RKN of some ELCA node by sequentially scanning the set of inverted lists once. As RKN-Base cannot avoid processing all useless nodes,an optimized algorithm,namely RKN-Optimized,was then proposed,which computes RKN sets based on the set of ELCA nodes, rather than the set of inverted lists as RKN-Base does. As a result,RKN-Optimized avoids processing useless nodes, and reduces the time complexity. The experimental results verified the efficiency of the proposed algorithms.

Key words: XML, subtree results construction, ELCA, RKN

陈子阳,王璿,汤显. 面向XML关键字查询的高效RKN求解策略[J]. 通信学报, 2014, 35(7): 46-55.

HENZi-yang C,ANGXuan W,ANGXian T. Efficiently computing RKN for keyword queries on XML data[J]. Journal on Communications, 2014, 35(7): 46-55.

图/表 12

图1

图2

图3

图4

图5

表1

基于XMark数据集的查询"

ID	Keywords	$\sum_{i = 1}^{m} \| L_{i} \|$	\|L_max\|	\|L_min\|	N_E	R_E/%	Group
Q1	bidder, incategory	710 593	411 575	299 018	1	0.0003	Group1
Q2	bidder, text	827 825	528 807	299 018	54 200	18.13
Q3	listitem, bold	675 087	370 118	304 969	117 933	38.67
Q4	bidder, listitem, incategory	1 015 562	411 575	299 018	1	0.0003	Group2
Q5	bidder, date, emph	1 106 809	457 231	299 018	29 089	9.73
Q6	check, listitem, keyword	693 390	352 121	36 300	11 552	31.82
Q7	bidder, listitem, date, incategory	1 472 793	457 231	299 018	1	0.0003	Group3
Q8	check, listitem, keyword, date	1 150 621	457 231	36 300	8 087	22.28
Q9	takano, keyword, bold, emph	1 089 928	370 118	17 129	6 448	37.64
Q10	bidder, text, time, date, incategory	2 009 949	528 807	299 018	1	0.0003	Group4
Q11	order, keyword, text, emph, increase	1 552 894	528 807	16 700	1 918	11.49
Q12	check, keyword, bold, text, date	1 744 577	528 807	36 300	16 204	44.64

表1

表2

表3

图6

图7

表4

图8

参考文献 17

[1]	TATARINOV I , VIGLAS S , BEYER K S , et al. Storing and querying ordered XML using a relational database system[A]. SIGMOD Con-ference[C]. 2002. 204-215.
[2]	GUO L , SHAO F , BOTEV C , et al. Xrank: ranked keyword search over XML documents[A]. SIGMOD Conference[C]. 2003. 16-27.
[3]	ZHOU RUI , LIU CHENGFEI , LI JIANXIN . Fast elca computation for keyword queries on XML data[A]. International Conference on Ex-tending DB Technology[C]. Lausanne, Switzerland, 2010. 549-560.
[4]	COHEN S , MAMOU J , KANZA Y , et al. Xsearch: a semantic search engine for XML[A]. VLDB[C]. 2010. 45-56.
[5]	LI G , FENG J , WANG J , et al. Effective keyword search for valuable lcas over XML documents[A]. CIKM[C]. 2007. 31-40.
[6]	ZHOU J , BAO Z , CHEN Z , et al. Top-down SLCA computation based on list partition[A]. DASFAA[C]. 2012.
[7]	WANG W Y , WANG X L , ZHOU A Y . Hash-search: an efficient slca-based keyword search algorithm on XML documents[A]. LNCS 5463[C]. 2009. 496-510.
[8]	XU Y , PAPAKONSTANTINOU Y . Efficient keyword search for smallest lcas in XML databases[A]. SIGMOD Conference[C]. 2005.
[9]	SUN C , CHAN C Y , GOENKA A K . Multiway slca-based keyword search in XML data[A]. WWW[C]. 2007. 1043-1052.
[10]	ZHOU J , BAO Z , WANG W , et al. Fast SLCA and ELCA computation for XML keyword queries based on set intersection[A]. ICDE[C]. 2012.
[11]	XU Y , PAPAKONSTANTINOU Y . Efficient lca based keyword search in XML data[A]. EDBT[C]. 2008.
[12]	LIU Z , CHEN Y Reasoning and identifying relevant matches for XML keyword search[A]. PVLDB, 2008. 1(1):921-932.
[13]	KONG L , GILLERON R , LEMAY A . Retrieving meaningful relaxed tightest fragments for XML keyword search[A]. EDBT[C]. 2009. 815-826.
[14]	ZHOU J , BAO Z , CHEN Z , et al. Fast result enumeration for keyword queries on XML data[A]. DASFAA[C]. 2012. 95-109.
[15]	HRISTIDIS V , KOUDAS N , PAPAKONSTANTINOU Y , et al. Key-word proximity search in XML trees[J]. IEEE Trans Knowl Data Eng, 2006,18(4):525-539.
[16]	TATARINOV I , VIGLAS S , et al. Storing and querying ordered XML using a relational database system[A]. Special Interest Group on Man-agement of Data Conference[C]. Madison, USA, 2002. 204-215.
[17]	BRODER A Z . A taxonomy of Web search[J]. SIGIR Forum, 2002,36(2):3-10.

查询ID	T_B/ms	T_O/ms	R_e/%
Q1	1 248	0.1	0.008
Q2	18 018	17 845	99.039
Q3	63 274	68 672	108.531
Q4	2 106	0.1	0.0047
Q5	3 978	3 588	90.196
Q6	1 279	1 124	87.881
Q7	4 227	0.1	0.002
Q8	1 841	499	27.104
Q9	1 592	437	27.449
Q10	7 895	0. 047	0.000 6
Q11	2 402	125	5.203
Q12	3 728	1 779	47.719

查询ID	N_B×10³	N_O×10³	N_e/%
Q1	2 543	0.04	0.0015 73
Q2	1 495 939	1 471 036	98.335 29
Q3	3 501 520	3 492 110	99.731 26
Q4	4 551	0.06	0.0013 18
Q5	247 065	213 349	86.353 39
Q6	54 119	34 976	64.627 95
Q7	7 910	0.08	0.001 011
Q8	51 528	17 030	33.049 99
Q9	38 391	7 768	20.233 91
Q10	12 751	0.101	0.000 792
Q11	42 979	1 225	2.850 229
Q12	124 417	67 445	54.208 83

ID	Keywords	N_E
Q1	article,book	1 456
Q2	algorithm,article	18 349
Q3	data,article	26 611
Q4	article,database	5 753
Q5	XML,article	1 033
Q6	year,2001	59 355
Q7	book,article,mining	6
Q8	algorithm,article,2001	521
Q9	article,data,mining	1 563
Q10	data,XML,article	209

面向XML关键字查询的高效RKN求解策略

Efficiently computing RKN for keyword queries on XML data

在线阅读

PDF下载

可视化

被引次数

摘要/Abstract

引用本文

使用本文

图/表 12

参考文献 17

相关文章 1

Metrics

推荐阅读 0