基于Solr的分布式实时搜索模型研究与实现

doi:10.3969/j.issn.1000-0801.2011.11.015

电信科学 ›› 2011, Vol. 27 ›› Issue (11): 51-56.doi: 10.3969/j.issn.1000-0801.2011.11.015

基于Solr的分布式实时搜索模型研究与实现

傅巍玮¹,李仁发¹,刘钰峰¹,黄松立²

¹ 湖南大学嵌入式系统及网络实验室长沙410082
² 淘宝（中国）有限责任公司杭州315100

出版日期:2011-11-15 发布日期:2011-11-15
基金资助:
国家自然科学基金资助项目;国家工业和信息化部核高基金资助项目

Study and Implementation of Distributed Real-Time Search Engine Model Based on Solr

Weiwei Fu¹,Renfa Li¹,Yufeng Liu¹,Songli Huang²

¹ Embedded Systems ＆ Networking Laboratory of Hunan University，Changsha 410082，China
² Taobao（China）Limited Liability Company，Hangzhou 315000，China

Online:2011-11-15 Published:2011-11-15

摘要/Abstract

摘要：

实时搜索已成为信息检索领域的热点问题之一。传统搜索引擎在分布式环境下无法保证大数据量、高并发情况下的实时响应和数据容灾。本文提出了一种基于 Solr 的分布式实时搜索模型，分析了其实现原理。模型通过内存索引与磁盘索引相结合保证索引信息的实时展示，同时引入CommitLog 日志保证内存索引数据容灾，并通过Master/Slave 模型保证搜索服务的可用性。最终应用于实际生产系统中，实践结果充分证明了该模型的可行性。

关键词: 信息检索, 分布式实时搜索模型, Solr, 数据容灾

Abstract:

Real-time search is a hot spot in research of information retrieval. In the distributed environment of big data and high concurrent，traditional search engine can not guarantee to make real-time response and data disaster tolerance. In this paper，we proposes a distributed real-time search engine model based on Solr，then explaines the principle and the procedures in detail. The memory index and disk index are integrated organically to present information in time. We brings out CommitLog to ensure memory index metadata disaster tolerance. Master/Slave model carry guarantee of high availability of search service. Practice has proved its feasibility.

Key words: information retrieval, distributed real-time search engine model, Solr，data disaster tolerance

傅巍玮,李仁发,刘钰峰,黄松立. 基于Solr的分布式实时搜索模型研究与实现[J]. 电信科学, 2011, 27(11): 51-56.

Weiwei Fu,Renfa Li,Yufeng Liu,Songli Huang. Study and Implementation of Distributed Real-Time Search Engine Model Based on Solr[J]. Telecommunications Science, 2011, 27(11): 51-56.

图/表 3

参考文献 10

1	Nizar Idoudi ， Claude Duvallet ， Bruno Sadeg . Improving distributed workload performance by sharing both CPU and memory resources.In： Proc of IEEE International Symposium on Object/Component/Service-Oriented Real-Time Distributed Computing， 2009
2	Bernard J Jansen ， Zhe Liu ， Courtney Weaver ， et al. Real time search on the Web：queries，topics，and economic value. Information Processing and Management， 2011
3	曾春，邢春晓，周立柱 . 基于内容过滤的个性化搜索算法. 软件学报， 2003，（5）
4	Daniel Peng ， Frank Dabek . Large-scale incremental processing using distributed transactions and notifications.In： Proc of the 9th USENIX Conference of OSDI， 2010
5	Wu Y ， Shou L ， Hu T ， et al. Query triggered crawling strategy：build a time sensitive vertical search engine. Cyberworlds， 2008
6	Jingyu Cui ， Fang Wen ， Xiaoou Tang . Real time Google and live image search re-ranking.In： Proc of the 16th ACM International Conference on Multimedia， 2008
7	Gershenfeld N ， Krikorian R ， Cohen D . The Internet of things. Scientific American， 2004，291（4）：76～81
8	Seth Gilkn ， Nancy Lynch . Brewerˊs conjecture and the feasibility of consistent，available，partition-tolerant Web services. Sinact News， 2002，33（2）
9	姚树宇，赵少东 . 一种使用分布式技术的搜索引擎. 计算机应用与软件， 2005， 22 （10）
10	Apache. SOLR.

性能测试负载比	TPS		响应时间（ms）		CPU		负载		索引数据量
性能测试负载比	Solr	Xsolr	Solr	Xsolr	Solr	Xsolr	Solr	Xsolr	索引数据量
只更新不搜索	440	2 100	0	0	21%	30%	3.7	3.9	4 000万个文档，
查询：更新为1：1	56	130	145	12	14%	24%	4.6	5.0	索引磁盘大小
查询：更新为2：1	24	50	280	140	13%	17%	4.1	4	12 GB
查询：更新为3：1	17	27	350	160	15%	16%	3.8	4.2

基于Solr的分布式实时搜索模型研究与实现

Study and Implementation of Distributed Real-Time Search Engine Model Based on Solr

在线阅读

PDF下载

可视化

被引次数

摘要/Abstract

引用本文

使用本文

图/表 3

参考文献 10

相关文章 1

Metrics

推荐阅读 0