面向XML关键字查询的高效RKN求解策略

通信学报

面向XML关键字查询的高效RKN求解策略

陈子阳，王璿，汤显

1. 燕山大学信息科学与工程学院，河北秦皇岛 066004； 2. 河北省计算机虚拟技术与系统集成重点实验室，河北秦皇岛 066004；3. 燕山大学经济与管理学院，河北秦皇岛 066004

出版日期:2014-07-25 发布日期:2014-07-15
基金资助:
国家自然科学基金资助项目（61040023,61272124, 61303040）；河北省教育厅研究计划基金资助项目（Y2012014）；河北省科学技术研究与发展计划科技支撑计划基金资助项目（11213578）

Efficiently computing RKN for keyword queries on XML data

Online:2014-07-25 Published:2014-07-15

摘要/Abstract

摘要： 构建结果子树是XML关键字查询处理的核心问题，其中求解与每个子树根节点相关的关键字节点是影响结果子树构建效率的重要步骤。针对已有方法不能正确求解基于ELCA(exclusive lowest common ancestor)语义的相关关键字节点(RKN，relevant keyword node)的问题，提出RKN的形式化定义及相应的RKN-Base算法。该算法通过顺序扫描每个关键字节点一次即可正确判断其是否为某个ELCA节点的RKN。针对RKN-Base不能避免处理无用节点的问题，提出一种优化算法RKN-Optimized，该算法基于每个ELCA节点求其RKN集合，从而避免了对无用节点的处理，降低了时间复杂度。最后，通过实验验证了所提算法的高效性。

Abstract: Subtree results construction is a core problem in keyword query processing over XML data，for which computing the set of relevant keyword nodes (RKN) for each subtree’s root node will greatly affect the overall system performance. Considering that existing methods cannot correctly identify RKN for ELCA semantics，the definitions of RKN and the RKN-Base algorithm were proposed，which can correctly judge whether a given node is an RKN of some ELCA node by sequentially scanning the set of inverted lists once. As RKN-Base cannot avoid processing all useless nodes，an optimized algorithm，namely RKN-Optimized，was then proposed，which computes RKN sets based on the set of ELCA nodes, rather than the set of inverted lists as RKN-Base does. As a result，RKN-Optimized avoids processing useless nodes, and reduces the time complexity. The experimental results verified the efficiency of the proposed algorithms.

陈子阳，王璿，汤显. 面向XML关键字查询的高效RKN求解策略[J]. 通信学报.