Telecommunications Science ›› 2017, Vol. 33 ›› Issue (8): 180-186.doi: 10.11959/j.issn.1000-0801.2017234

• Operation technology wide Angle • Previous Articles     Next Articles

Research and design of distributed high-performance network reptiles based on cloud platform

Enming SHI,Xiaojun XIAO,Yu LU   

  1. Guangzhou Useease Information Technology Co.,Ltd.,Guangzhou 510630,China
  • Revised:2017-07-27 Online:2017-08-01 Published:2017-08-25

Abstract:

With the arrival of large data age,data has become the most valuable resource.And web crawler technology as an important means of external data collection,has become a standard tool for data analysis.A high-performance,convenient cloud-based crawler architecture design was introduced.The overall structure of the reptile to the distributed design and the design of the sub-module was described in detail.Each module of the crawler was encapsulated in Docker,and Kubernetes was used as the resource scheduling and management of the cluster.In the performance of optimization,the MD5 reset tree algorithm,DNS optimization and asynchronous I/O were adopted.Experimental results show that the performance of crawler has obvious advantages compared with the UN optimized scheme.

Key words: distributed system architecture, web crawler, Docker, high-performance

CLC Number: 

No Suggested Reading articles found!