通信学报 ›› 2016, Vol. 37 ›› Issue (3): 20-32.doi: 10.11959/j.issn.1000-436x.2016049

• 学术论文 • 上一篇    下一篇

基于属性值序列图模型的deep Web新数据发现策略

崔志明1,2,赵朋朋2,鲜学丰1,2,3,方立刚1,3,杨元峰1,3,顾才东1,3   

  1. 1 江苏省现代企业信息化应用支撑软件工程技术研发中心,江苏 苏州215104
    2 苏州大学智能信息处理及应用研究所,江苏 苏州215006
    3 苏州市职业大学计算机工程学院,江苏 苏州215104
  • 出版日期:2016-03-25 发布日期:2017-08-04
  • 基金资助:
    国家自然科学基金资助项目;国家自然科学基金资助项目;国家自然科学基金资助项目;江苏省自然科学基金资助项目;苏州市科技计划基金资助项目;苏州市科技计划基金资助项目;苏州市科技计划基金资助项目

Deep Web new data discovery strategy based on the graph model of data attribute value lists

Zhi-ming CUI1,2,Peng-peng ZHAO2,Xue-feng XIAN1,2,3,Li-gang FANG1,3,Yuan-feng YANG1,3,Cai-dong GU1,3   

  1. 1 Jiangsu Province Support Software Engineering R&D Center for Modern Information Technology Application in Enterprise,Suzhou 215104,China
    2 Institute of Intelligent Information Processing and Application,Soochow University,Suzhou 215006,China
    3 School of Computer Engineering,Suzhou Vocational University,Suzhou 215104,China
  • Online:2016-03-25 Published:2017-08-04
  • Supported by:
    The National Natural Science Foundation of China;The National Natural Science Foundation of China;The National Natural Science Foundation of China;The Natural Science Foundation of Jiangsu Province;Suzhou Foundation for Development of Science and Technology;Suzhou Foundation for Development of Science and Technology;Suzhou Foundation for Development of Science and Technology

摘要:

针对数据源新产生数据记录的增量爬取问题,提出了一种deep Web 新数据发现策略,该策略采用一种新的属性值序列图模型表示deep Web 数据源,将新数据发现问题转化为属性值序列图的遍历问题,该模型仅与数据相关,与现有查询关联图模型相比,具有更强的适应性和确定性,可适用于仅仅包含简单查询接口的deep Web数据源。在此模型的基础上,发现增长节点并预测其新数据发现能力;利用互信息计算节点之间的依赖关系,查询选择时尽可能地降低查询依赖带来的负面影响。该策略提高了新数据爬取的效率,实验结果表明,在相同资源约束前提下,该策略能使本地数据和远程数据保持最大化同步。

关键词: deepWeb, 新数据发现, 数据获取

Abstract:

A novel deep Web data discovery strategy was proposed for new generated data record in resources.In the ap-proach,a new graph model of deep Web data attribute value lists was used to indicate the deep Web data source,an new data crawling task was transformed into a graph traversal process.This model was only related to the data,compared with the ex-isting query-related graph model had better adaptability and certainty,applicable to contain only a simple query interface of deep Web data sources.Based on this model,which could discovery incremental nodes and predict new data mutual infor-mation was used to compute the dependencies between nodes.When the query selects,as much as possible to reduce the negative impact brought by the query-dependent.This strategy improves the data crawling efficiency.Experimental results show that this strategy could maximize the synchronization between local and remote data under the same restriction.

Key words: deep Web, new data discovery, data acquisition

No Suggested Reading articles found!