通信学报 ›› 2015, Vol. 36 ›› Issue (12): 77-88.doi: 10.11959/j.issn.1000-436x.2015316
杜小勇1,2,陈峻1,2,陈跃国1,2
出版日期:
2015-12-25
发布日期:
2017-07-17
基金资助:
Xiao-yong DU1,2,Jun CHEN1,2,Yue-guo CHEN1,2
Online:
2015-12-25
Published:
2017-07-17
Supported by:
摘要:
数据探索(data exploration)是有别于数据服务与数据分析的第3种体现大数据价值的技术手段。数据服务强调从微观层面获取满足用户需求的精准信息;数据分析强调从宏观层面为用户提供数据洞察,进而提供决策支持;而数据探索是一种支持用户在微观层面和宏观层面进行自由切换的、深入浅出的、交互式发掘数据价值的方式。首先,简要介绍大数据价值发掘的传统技术手段和特点,并引入探索式搜索;其次,详细阐述探索式搜索的定义与模型,总结探索式搜索的特点;随后,基于组件化的思想,设计探索式搜索系统框架,并综述每个组件所涉及到的挑战与关键技术;最后简要介绍了笔者在知识库探索式搜索方面的尝试。
杜小勇,陈峻,陈跃国. 大数据探索式搜索研究[J]. 通信学报, 2015, 36(12): 77-88.
Xiao-yong DU,Jun CHEN,Yue-guo CHEN. Exploratory search on big data[J]. Journal on Communications, 2015, 36(12): 77-88.
[1] | MENG X F , CI X . Big data management:concepts,techniques and challenges[J]. Journal of Computer Research and Development, 2013,50(1): 146-169. |
[2] | MANNING C , RAGHAVAN P,SCHüTZE H . Introduction to Information Retrieval[M]. Cambridge University Press, 2008. |
[3] | JUDD C , MCCLELLAND G , RYAN C . Data Analysis:a Model comparison approach[M]. Routledge Press, 2009. |
[4] | MARCHIONINI G . Exploratory search:from finding to understanding[J]. Communication of the ACM, 2006,49(4): 41-46. |
[5] | HECHT B , CARTON S , QUADERI M ,et al. Explanatory semantic relatedness and explicit spatialization for exploratory search[A]. SIGIR[C]. 2012. 415-424. |
[6] | ROITMAN H , YOGEV S , TSIMERMAN Y ,et al. Exploratory search over social-medical data[A]. CIKM[C]. 2011, 1513-2516. |
[7] | BOZZON A , BRAMBILLA M , CERI S ,et al. Exploratory search in multi-domain information spaces with liquid query[A]. WWW[C]. 2011. 189-192. |
[8] | HAM F , PERER A . Search,show context,expand on demand:supporting large graph exploration with degree-of-interest[J]. IEEE Transaction on Visualization and Computer Graphics, 2009,15(6): 953-960. |
[9] | DUNNE C , RICHE N , LEE B ,et al. GraphTrail:analyzing large multivariate,heterogeneous networks while supporting exploration history[A]. CHI[C]. 2012. 1663-1672. |
[10] | YOGEV S , ROITMAN H , CARMEL D ,et al. Towards expressive exploratory search over entity-relationship data[A]. WWW[C]. 2012. 83-92. |
[11] | MIRIZZI R , RAGONE A , SCIASCIO E . Like breadcrumbs in the forest:a tool for semantic exploratory search[A]. EDBT/ICDT Workshop on Linked Web Data Management[C]. 2011. 32-33. |
[12] | KOUTRIKA G , LAKSHMANAN L , RIEDEWALD M ,et al. Report on the first international workshop on exploratory search in databases and the Web[J]. SIGMOD Record, 2014,43(2): 49-52. |
[13] | IDREOS S , PAPAEMMANOUIL O , CHAUDHURI S . Overview of data exploration techniques[A]. SIGMOD[C]. 2015. 277-281. |
[14] | WHITE R , KULES B , BEDERSON B . Exploratory search interfaces:categorization,clustering and beyond[J]. SIGIR Forum, 2005,39(2): 52-56. |
[15] | WHITE R , MURESAN G , MARCHIONINI G . Report on ACM SIGIR 2006 workshop on evaluating exploratory search systems[J]. SIGIR Forum, 2006,40(2): 52-60. |
[16] | WHITE R , DRUKER S , MARCHIONINI G ,et al. Exploratory search and HCI:designing and evaluating interfaces to support exploratory search interaction[A]. SIGCHI[C]. 2007. 2877-2880. |
[17] | WHITE R , ROTH R . Exploratory search:beyond the query-response paradigm[M]. Morgan & Claypool Publishers, 2009. |
[18] | AGAPIE E , GOLOVCHINSKY G , QVARFORDT P . Leading people to longer queries[A]. CHI[C]. 2013. 3019-3022. |
[19] | TRETTER S , GOLOVCHINSKY G , QVARFORDT P . SearchPanel:a browser extension for managing search activity[A]. EuroHCIR[C]. 2013. 51-54. |
[20] | GOLOVCHINSKY G , DIRIYE A , DUNNIGAN T . The future is in the past:designing for exploratory search[A]. IIiX[C]. 2012. 52-61. |
[21] | GOLOVCHINSKY G , QVARFORDT P , PICKENS J . Collaborative information seeking[J]. IEEE Computer Society, 2009,42(3): 47-51. |
[22] | MORRIS M , HORVITZ E . SearchTogether:an interface for collaborative web search[A]. UIST[C]. 2007. 3-12. |
[23] | REN L . Research on Interaction Techniques in Information Visualization[D]. Beijing:Chinese Academy of Sciences, 2009. |
[24] | CARD K , MACKINLAY D , SHNEIDERMAN B . Readings in Information Visualization:Using Vision to Think[M]. San Francisco: Morgan-Kaufmann PublishersPress, 1999. |
[25] | KEIM D . Information visualization and visual data mining[J]. IEEE Transaction on Visualization and Computer Graphics, 2002,8(1): 1-8. |
[26] | REN L,DU Y , MA S , ZHANG XL ,et al. Visual analytics towards big data[J]. Journal of Software, 2014,25(9): 1909-1936. |
[27] | STOLTE C , TANG D , HANRAHAN P . Polaris:a system for query,analysis and visualization of multi-dimensional relational databases[J]. IEEE Transactions on Visualization and Computer Graphics, 2002,8(1) |
[28] | KEY A , HOWE B , PERRY D ,et al. VizDeck:self-organizing dashboards for visual analytics[A]. SIGMOD[C]. 2012. 681-684. |
[29] | ABOUZIED A , HELLERSTEIN J , SILBERSCHATZ A . Playful query specification with dataplay[J]. Proceedings of the Very Large Data Bases Endowment, 2012,5(12): 1938-1941. |
[30] | QARABAQI B , RIEDEWALD M . User-driven refinement of imprecise queries[A]. ICDE[C]. 2014. 916-927. |
[31] | TRAN Q , CHAN CY , PARTHASARATHY S . Query by output[A]. SIGMOD[C]. 2009. 535-548. |
[32] | SHOKOUHI M , SLOAN M , BENNETT PN ,et al. Query suggestion and data fusion in contextual disambiguation[A]. WWW[C]. 2015. 971-980. |
[33] | GAO J , YUAN W , LI X ,et al. Smoothing click through data for Web search ranking[A]. SIGIR[C]. 2009. 355-362. |
[34] | GUO F , LIU C , KANNAN A ,et al. Click chain model in Web search[A]. WWW[C]. 2009. 11-20. |
[35] | AGICHTEIN E , BRILL E , DUMAIS S . Improving Web search ranking by incorporating user behavior information[A]. SIGIR[C]. 2006. 19-26. |
[36] | DROSOU M , PITOURA E . YmalDB:exploring relational databases via result-driven recommendations[J]. Proceedings of the Very Large Data Bases Endowment, 2013,22(6): 849-874. |
[37] | SCHMEIER S . Exploratory search on mobile devices[D]. German Research Center for Artificial Intelligence and Saarland University, 2013. |
[38] | PAPADAKOS P , TZITZIKAS Y . Hippalus:preference-enriched faceted exploration[A]. EDBT/ICDT Workshops[C]. 2014. 167-172. |
[39] | TAUHEED F , HEINIS T , SCHURMANN F ,et al. SCOUT:prefetching for latent structure following queries[J]. Proceedings of the Very Large Data Bases Endowment, 2012,5(11): 1531-1542. |
[40] | SIDIROURGOS L , KERSTEN M L , BONCZ PA . Scientific discovery through weighted sampling[A]. Big Data Conference[C]. 2013. 300-306. |
[41] | SIDIROURGOS L , KERSTEN M L , BONCZ P A . SciBORQ:scientific data management with bounds on runtime and quality[A]. Biennial Conference on Innovative Data Systems Research (CIDR)[C]. 2011. 296-301. |
[42] | ACHARYA S , GIBBONS P , POOSALA V ,et al. The aqua approximate query answering system[A]. SIGMOD[C]. 1999. 574-576. |
[43] | AGARWAL S , MILNER H , KLEINER A ,et al. Knowing when you're wrong:building fast and reliable approximate query processing systems[A]. SIGMOD[C]. 2014. 481-492. |
[44] | AGARWAL S , MOZAFARI B , PANDA A ,et al. BlinkDB:queries with bounded errors and bounded response times on very large data[A]. EuroSys[C]. 2013. 29-42. |
[45] | HOFFART J , SUCHANEK F , BERBERICH K ,et al. YAGO2:exploring and querying world knowledge in time,space,context,and many languages[A]. WWW[C]. 2011. 229-232. |
[46] | RDF model and syntax specification[S]. 1999. |
[47] | DU F , CHEN Y G , DU X Y . Survey of RDF query processing techniques[J]. Journal of Software, 2013,24(6): 1222-1242. |
[48] | MALEWICZ G , AUSTERN M , BIK A ,et al. Pregel:a system for large-scale graph processing[A]. SIGMOD[C]. 2010. 135-146. |
[49] | LOW Y C , GONZALEZ J , KYROLA A ,et al. Distributed GraphLab:a framework for machine learning in the cloud[J]. Proceedings of the Very Large Data Bases Endowment, 2012,5(8): 716-727. |
[50] | GONZALEZ J E , XIN RS , DAVE A ,et al. GraphX:graph processing in a distributed dataflow framework[A]. OSDI[C]. 2014. 599-613. |
[51] | SHAO B , WANG H , LI Y . Trinity:a distributed graph engine on a memory cloud[A]. SIGMOD[C]. 2013. 505-516. |
[52] | CHANG L , WANG ZW , M A T ,et al. HAWQ:a massively parallel processing SQL engine in hadoop[A]. SIGMOD[C]. 2015. 1223-1234. |
[53] | LI J Z , GAO H , LUO J Z ,et al. InfiniteDB:a pc-cluster based parallel massive database management system[A]. SIGMOD[C]. 2007. 899-909. |
[54] | Cloudera Impala[EB/OL]. . |
[55] | DIACONU C , FREEDMAN C , ISMERT E ,et al. Hekaton:SQL server‘s memory-optimized OLTP engine[A]. SIGMOD[C]. 2013. 1243-1254. |
[56] | SAP HANA[EB/OL]. . |
[57] | MonetDB[EB/OL]. . |
[58] | ANTOVA L , EL-HELW A , SOLIMAN M ,et al. Optimizing queries over partitioned tables in MPP systems[A]. SIGMOD[C]. 2014. 373-384. |
[59] | VALIANT L . A bridging model for parallel computation[J]. Communication on ACM, 1990,33(8): 103-111. |
[1] | 金伟, 李凤华, 余铭洁, 郭云川, 周紫妍, 房梁. 面向HDFS的密钥资源控制机制[J]. 通信学报, 2022, 43(9): 27-41. |
[2] | 毛伊敏, 邓千虎, 陈志刚. 基于信息熵与遗传算法的并行关联规则增量挖掘算法[J]. 通信学报, 2021, 42(5): 122-136. |
[3] | 袁亮, 俞啸, 丁恩杰, 赵小虎, 冯仕民, 张达, 刘统玉, 王卫东, 黄艳秋. 矿山物联网人-机-环状态感知关键技术研究[J]. 通信学报, 2020, 41(2): 1-12. |
[4] | 杨鹏,李幼平. 支持内容智能治理的双结构互联网[J]. 通信学报, 2019, 40(9): 1-14. |
[5] | 蒲勇霖,于炯,鲁亮,李梓杨,卞琛,廖彬. 基于Storm平台的数据迁移合并节能策略[J]. 通信学报, 2019, 40(12): 68-85. |
[6] | 付钰, 俞艺涵, 吴晓平. 大数据环境下差分隐私保护技术及应用[J]. 通信学报, 2019, 40(10): 157-168. |
[7] | 蒲勇霖,于炯,鲁亮,卞琛,廖彬,李梓杨. storm平台下工作节点的内存电压调控节能策略[J]. 通信学报, 2018, 39(10): 97-117. |
[8] | 张坤芳,鲁鸣鸣,郑林. 大数据驱动的地铁众包快递系统[J]. 通信学报, 2017, 38(Z2): 99-112. |
[9] | 沙乐天,肖甫,陈伟,孙晶,王汝传. 基于多属性决策及污点跟踪的大数据平台敏感信息泄露感知方法[J]. 通信学报, 2017, 38(7): 56-69. |
[10] | 沈俊鑫,陈颖谦. 面向欠发达地区大数据产业发展能力分析的网络化方法研究[J]. 通信学报, 2017, 38(12): 153-159. |
[11] | 金鑫,李龙威,季佳男,李祉歧,胡宇,赵永彬. 基于大数据和优化神经网络短期电力负荷预测[J]. 通信学报, 2016, 37(Z1): 36-42. |
[12] | 张琳,刘彦,王汝传. 位置大数据服务中基于差分隐私的数据发布技术[J]. 通信学报, 2016, 37(9): 46-54. |
[13] | 耿少峰,王永恒,李仁发,张佳. 主动式复杂事件处理方法的研究[J]. 通信学报, 2016, 37(9): 111-120. |
[14] | 朱斐,许志鹏,刘全,伏玉琛,王辉. 基于可中断Option的在线分层强化学习方法[J]. 通信学报, 2016, 37(6): 65-74. |
[15] | 任艳丽,谷大武,蔡建兴,黄春水. 隐私保护的可验证多元多项式外包计算方案[J]. 通信学报, 2015, 36(8): 23-30. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||
|