通信学报 ›› 2014, Vol. 35 ›› Issue (7): 46-55.doi: 10.3969/j.issn.1000-436x.2014.07.006

• 论文Ⅱ • 上一篇    下一篇

面向XML关键字查询的高效RKN求解策略

陈子阳1,2,王璿1,2(),汤显3   

  1. 1 燕山大学 信息科学与工程学院,河北 秦皇岛 066004
    2 河北省计算机虚拟技术与系统集成重点实验室,河北 秦皇岛 066004
    3 燕山大学 经济与管理学院,河北 秦皇岛 066004
  • 出版日期:2014-07-25 发布日期:2017-06-24
  • 基金资助:
    国家自然科学基金资助项目;国家自然科学基金资助项目;国家自然科学基金资助项目;河北省教育厅研究计划基金资助项目;河北省科学技术研究与发展计划科技支撑计划基金资助项目

Efficiently computing RKN for keyword queries on XML data

HENZi-yang C1,2,ANGXuan W1,2(),ANGXian T3   

  1. 1 School of Information Science and Engineering, Yanshan University, Qinhuangdao 066004, China
    2 Key Laboratory for Computer Virtual Technology and System Integration of Hebei Province, Yanshan University, Qinhuangdao 066004, China
    3 School of Economics and Management, Yanshan University, Qinhuangdao 066004,China
  • Online:2014-07-25 Published:2017-06-24
  • Supported by:
    The National Natural Science Foundation of China;The National Natural Science Foundation of China;The National Natural Science Foundation of China;The Research Funds From Education Department of Hebei Province;The Science and Technology Research and Development Program of Hebei Province

摘要:

构建结果子树是XML关键字查询处理的核心问题,其中求解与每个子树根节点相关的关键字节点是影响结果子树构建效率的重要步骤。针对已有方法不能正确求解基于ELCA(exclusive lowest common ancestor)语义的相关关键字节点(RKN,relevant keyword node)的问题,提出RKN的形式化定义及相应的RKN-Base算法。该算法通过顺序扫描每个关键字节点一次即可正确判断其是否为某个ELCA节点的RKN。针对RKN-Base不能避免处理无用节点的问题,提出一种优化算法RKN-Optimized,该算法基于每个ELCA节点求其RKN集合,从而避免了对无用节点的处理,降低了时间复杂度。最后,通过实验验证了所提算法的高效性。

关键词: 可扩展标记语言, 子树构建, ELCA, 相关关键字节点

Abstract:

Subtree results construction is a core problem in keyword query processing over XML data,for which com-puting the set of relevant keyword nodes (RKN) for each subtree's root node will greatly affect the overall system per-formance. Considering that existing methods cannot correctly identify RKN for ELCA semantics,the definitions of RKN and the RKN-Base algorithm were proposed,which can correctly judge whether a given node is an RKN of some ELCA node by sequentially scanning the set of inverted lists once. As RKN-Base cannot avoid processing all useless nodes,an optimized algorithm,namely RKN-Optimized,was then proposed,which computes RKN sets based on the set of ELCA nodes, rather than the set of inverted lists as RKN-Base does. As a result,RKN-Optimized avoids processing useless nodes, and reduces the time complexity. The experimental results verified the efficiency of the proposed algorithms.

Key words: XML, subtree results construction, ELCA, RKN

No Suggested Reading articles found!