通信学报 ›› 2013, Vol. 34 ›› Issue (10): 121-134.doi: 10.3969/j.issn.1000-436x.2013.10.015
出版日期:
2013-10-25
发布日期:
2017-08-10
基金资助:
Jing YANG,Wen-ping LI(),Jian-pei ZHANG
Online:
2013-10-25
Published:
2017-08-10
Supported by:
摘要:
针对传统大数据典型相关分析(CCA,canonical correlation analysis)方法的高复杂度在面临大数据PB级数据规模时不再适应的现状,提出了一种基于云模型的大数据 CCA 方法。该方法在云计算架构的基础上,通过云运算将各端点云合并为中心云,并据此产生中心云滴,以中心云滴作为大数据的不确定性复原小样本,在其上施以CCA运算,中心云滴的较小数据量提高了运算效率。在真实数据集上的实验结果验证了该方法的有效性。
杨静,李文平,张健沛. 大数据典型相关分析的云模型方法[J]. 通信学报, 2013, 34(10): 121-134.
Jing YANG,Wen-ping LI,Jian-pei ZHANG. Canonical correlation analysis of big data based on cloud model[J]. Journal on Communications, 2013, 34(10): 121-134.
表1
不同数据容量下典型相关系数平均误差"
组数 | 数据总容量(×1)07 | ||||||||||||||
第1典型系数 | 第2典型系数 | ||||||||||||||
PAMAP2 | IDS | PAMAP2 | IDS | ||||||||||||
BDCCA | ApproxCCA | LS-CCA | BDCCA | ApproxCCA | LS-CCA | BDCCA | ApproxCCA | LS-CCA | BDCCA | ApproxCCA | LS-CCA | ||||
1 | 0.098 6 | 0.067 5 | 0.031 9 | 0.125 0 | 0.077 0 | 0.020 5 | 0.064 8 | 0.030 1 | 0.009 6 | 0.125 1 | 0.086 6 | 0.083 0 | |||
2 | 0.119 2 | 0.136 8 | 0.092 5 | 0.136 9 | 0.108 3 | 0.070 8 | 0.059 7 | 0.049 7 | 0.035 0 | 0.136 4 | 0.090 8 | 0.086 6 | |||
3 | 0.119 3 | 0.157 4 | 0.105 3 | 0.132 6 | 0.155 1 | 0.109 7 | 0.087 2 | 0.082 6 | 0.051 4 | 0.141 9 | 0.136 2 | 0.122 5 | |||
4 | 0.121 0 | 0.170 9 | 0.153 7 | 0.157 5 | 0.211 5 | 0.159 0 | 0.096 6 | 0.109 0 | 0.082 0 | 0.145 3 | 0.182 6 | 0.174 5 | |||
5 | 0.137 1 | 0.268 6 | 0.213 0 | 0.166 2 | 0.315 9 | 0.219 3 | 0.102 1 | 0.116 7 | 0.095 2 | 0.153 3 | 0.231 8 | 0.194 2 | |||
6 | 0.127 2 | 0.272 9 | 0.225 9 | 0.172 3 | 0.353 3 | 0.255 0 | 0.099 7 | 0.213 6 | 0.113 7 | 0.162 4 | 0.336 3 | 0.279 8 | |||
7 | 0.129 6 | 0.287 0 | 0.228 9 | 0.164 7 | 0.366 2 | 0.273 3 | 0.121 6 | 0.231 9 | 0.153 1 | 0.159 5 | 0.351 9 | 0.301 0 | |||
8 | 0.133 5 | 0.336 0 | 0.270 1 | 0.157 5 | 0.423 9 | 0.287 1 | 0.113 9 | 0.297 5 | 0.163 4 | 0.161 9 | 0.396 0 | 0.311 6 | |||
9 | 0.124 9 | 0.352 1 | 0.284 2 | 0.160 5 | 0.432 7 | 0.333 2 | 0.111 8 | 0.319 9 | 0.178 7 | 0.163 0 | 0.433 9 | 0.329 2 | |||
10 | 0.145 9 | 0.408 7 | 0.314 2 | 0.173 9 | 0.438 6 | 0.408 1 | 0.116 9 | 0.365 1 | 0.196 0 | 0.179 7 | 0.444 5 | 0.342 7 |
[1] | MINNESOTA M . Big data:science in the petabyte era[J]. Nature, 2008,455(7209): 1-136. |
[2] | SAKAR C O , KURSUN O . A method for combining mutual informa-tion and canonical correlation analysis:predictive mutual information and its use in feature selection[J]. Expert Systems wi h Applications, 2012,39(3): 3333-3344. |
[3] | OLCAY K , ETHEM A , OLEG V ,et al. Canonical correlation analysis using within-class coupling[J]. Pattern Recognition Letters, 2011,32(2): 134-144. |
[4] | KAMALIKA C , SHAM M K , KAREN L ,et al. Multi-view clustering via canonical correlation analysis[A]. Proc of the 26th International Conference on Machine Learning[C]. New York,ACM,USA, 2009. 129-136. |
[5] | 杨静, 李文平, 张健沛 . 基于秩2更新的多维数据流典型相关跟踪算法[J]. 电子学报, 2012,40(9): 1765-1774. YANG J , LI W P , ZHANG J P . A tracking algorithm based on rank two modifications for canonical correlation analysis of mu idimensional data streams[J]. Acta Electronica Sinica, 2012,40(9): 1765-1774. |
[6] | 顾鑫, 徐正全, 刘进 . 基于云理论的可信研究及展望[J]. 通信学报, 2011,32(7): 176-181. GU X , XU Z Q , LIU J . Review of cloud based trust model[J]. Journal on Communications, 2011,32(7): 176-181. |
[7] | 黄海生, 王汝传 . 基于隶属云理论的主观信任评估模型研究[J]. 通信学报, 2008,29(4): 13-19. HUANG H S , WANG R C . Subjective trust evaluation model based on membership cloud theory[J]. Journal on Communications, 2008,29(4): 13-19. |
[8] | 蒋嵘, 李德毅 . 基于形态表示的时间序列相似性搜索[J]. 计算机研究与发展, 2000,37(5): 601-608. JIANG R , LI D Y . Similarity search based on shape representation in time-series data sets[J]. Journal of Computer Research & Development, 2000,37(5): 601-608. |
[9] | 许凯, 秦昆, 黄伯和 ,等. 基于云模型的图像区域分割方法[J]. 中国图象图形学报, 2010,15(5): 757-763. XU K , QIN K , HUANG B H ,et al. A new method of region based on image segmentation based on cloud model[J]. Journal of Image and Grphics, 2010,15(5): 757-763. |
[10] | HOTELLING H . Relations between two sets of variates[J]. Biometri-ka, 1936,28(3): 321-377. |
[11] | 彭岩, 张道强 . 半监督典型相关分析算法[J]. 软件学报, 2008,19(11): 2822-2832. PENG Y , ZHANG D Q . Semi-supervised canonical correlation analy-sis algorithm[J]. Journal of Software, 2008,19(11): 2822-2832. |
[12] | 顾晶晶, 陈松灿, 庄毅 . 用局部保持典型相关分析定位无线传感器网络节点[J]. 软件学报, 2010,21(11): 2883-2891. GU J J , CHEN S C , ZHUANG Y . Localization in wireless sensor net-work using locality preserving canonical correlation analysis[J]. Jour-nal of Software, 2010,21(11): 2883-2891. |
[13] | LI D Y , HAN J W . Knowledge representation and discovery based on linguistic atoms[J]. Knowledge-based Systems, 1998,7(10): 431-440. |
[14] | PHILIP R . Big Data Analytics[R]. TDWI Best Parctices Report, 2011.3-38. |
[15] | BENJAMIN H B , MARK R B , KEITH A S ,et al. Large-scale elec-trophysiology:acquisition,comprression,encryption,storage of big data[J]. Journal of Neurosience Methods, 2009,180(1): 185-192. |
[16] | ARONOVA E , BAKER K , ORESKES N . Big science and big dat in biology:from the international geophysical year through the international biological program to the long term ecological research (LTER)network[J]. Historical Studies in the Natural Sciences, 2010,40(8): 183-224. |
[17] | WERNER C . Scienti?c perspectivism:a philosopher of science's response to the challenge of big data biology[J]. Studies in History and Philosophy of Biological and Biomedical Sciences, 2012,43(1): 69-80. |
[18] | ALFREDO C , YEOL S , KAREN C D . Analytics over largescale mul-tidimensional data:the big data revolution[A]. Proc of the DOLAP'11[C]. Glasgow, 2011. 101-103. |
[19] | STEVEN C H H , WANG J L , ZHAO P L ,et al. Online feature selec-tion for mining big data[A]. Proc of the Big-Mine'12[C]. New York:ACM,USA, 2012. 93-100. |
[20] | SIMON B , DUODUO L . On clusterization of ''big data'' streams[A]. Proc of the 3rd International Conference on Computing Geospatial Research and Applications[C]. New York:ACM,USA, 2012. 1-6. |
[21] | JOHN L . Parallel machine learning on big data[J]. XRDS, 2012,19(1): 60-62. |
[22] | THOMAS C , PEGGY H , MELANIE M ,et al. Building a big data research program at a small university[J]. JCSC, 2012,28(2): 95-102. |
[23] | YU C , CHENG J Q , FLORIN R . GLADE:big data analytics made easy[A]. Proc of the SIGMOD'12[C]. New York:ACM,USA, 2012. 697-700. |
[24] | KYUSEOK S . MapReduce algorithms for big data analysis[A]. Proc of the 38th International Conference on Very Large Data Bases(VLDB)[C]. New York:ACM,USA, 2012. 2016-2017. |
[25] | JENS D , JORGE A . Efficient big data processing in hadoop MapRe-duce[A]. Proc of the 38th International Conference on Very Large Data Bases(VLDB)[C]. New York:USA,ACM, 2012. 2014-2015. |
[26] | DIVYAKANT A , SUDIPTO D , AMR E A . Big data and cloud compu-ting:current state and future opportunities[A]. Proc of the EDBT 2011[C]. New York:ACM,USA, 2011. 530-533. |
[27] | XU H Q , LI Z , GUO S M ,et al. CloudVista:interactive and economi-cal visual cluster analysis for big data in the cloud[A]. Proc of the 38th International Conference on very Large Data Bases(VLDB)[C]. New York:USA,ACM, 2012. 1886-1889. |
[28] | COLIN T , DIGITAL P . Big data security[J]. Network Security, 2012,7(2): 5-8. |
[29] | SOTIRIS K . Combining bagging,boosting,rotation forest and random subspace methods[J]. Artificial Intelligence Review, 2011,35(3): 223-240. |
[30] | 李德毅, 杜鷁 . 不确定性人工智能[M]. 北京 : 国防工业出版社, 2005. 224-227. LI D Y , DU Y . Artificial Intelligence with Uncertainty[M]. Beijing: National Defence Industry PressPress, 2005. 224-227. |
[31] | TAVALLAEE M , BAGHERI E , LU W ,et al. A detailed analysis of the KDD CUP 99 data set[A]. Proc of the Second IEEE International Conference on Computational Intelligence for Security and Defense Applications[C]. Ottawa,Canada, 2009. 53-58. |
[32] | WANG Y L , ZHANG G X , QIAN J B . ApproxCCA:an approximate correlation analysis algorithm for multidimensional data streams[J]. Knowledge-Based Systems, 2011,24(7): 952-962. |
[33] | SUN L , JI S W . Canonical correlation analysis for multilabel classifi-cation:a least-squares formulation,extensions,and analysis[J]. IEEE Transactions on Pattern Analysis and Machine Intellige 2011,33(1): 194-200. |
[1] | 马玲, 樊漆亮, 许婷, 郭冠琛, 张圣林, 孙永谦, 张玉志. 基于强化学习的在线离线混部云环境下的调度框架[J]. 通信学报, 2023, 44(6): 90-102. |
[2] | 金伟, 李凤华, 余铭洁, 郭云川, 周紫妍, 房梁. 面向HDFS的密钥资源控制机制[J]. 通信学报, 2022, 43(9): 27-41. |
[3] | 王化群, 刘哲, 何德彪, 李继国. 公有云中身份基多源IoT终端数据PDP方案[J]. 通信学报, 2021, 42(7): 52-60. |
[4] | 毛伊敏, 邓千虎, 陈志刚. 基于信息熵与遗传算法的并行关联规则增量挖掘算法[J]. 通信学报, 2021, 42(5): 122-136. |
[5] | 张键红, 武梦龙, 王晶, 刘沛, 姜正涛, 彭长根. 云环境下安全的可验证多关键词搜索加密方案[J]. 通信学报, 2021, 42(4): 139-149. |
[6] | 李瑞琪, 贾春福, 王雅飞. 基于NTRU的多密钥同态代理重加密方案及其应用[J]. 通信学报, 2021, 42(3): 11-22. |
[7] | 张嘉伟, 马建峰, 马卓, 李腾. 云计算中基于时间和隐私保护的可撤销可追踪的数据共享方案[J]. 通信学报, 2021, 42(10): 81-94. |
[8] | 王文娟, 杜学绘, 单棣斌. 基于动态概率攻击图的云环境攻击场景构建方法[J]. 通信学报, 2021, 42(1): 1-17. |
[9] | 田有亮,骆琴. 基于改进Merkle-Tree认证方法的可验证多关键词搜索方案[J]. 通信学报, 2020, 41(9): 118-129. |
[10] | 王娜,郑坤,付俊松,李剑. 基于分块的移动边缘计算密文检索方法[J]. 通信学报, 2020, 41(7): 95-102. |
[11] | 赵临东,庄文芹,陈建新,周亮. 异构蜂窝网络中分层任务卸载:建模与优化[J]. 通信学报, 2020, 41(4): 34-44. |
[12] | 袁亮, 俞啸, 丁恩杰, 赵小虎, 冯仕民, 张达, 刘统玉, 王卫东, 黄艳秋. 矿山物联网人-机-环状态感知关键技术研究[J]. 通信学报, 2020, 41(2): 1-12. |
[13] | 梁冰,纪雯. 基于次模优化的边云协同多用户计算任务迁移方法[J]. 通信学报, 2020, 41(10): 25-36. |
[14] | 杨鹏,李幼平. 支持内容智能治理的双结构互联网[J]. 通信学报, 2019, 40(9): 1-14. |
[15] | 苏命峰,王国军,李仁发. 基于利益相关视角的多维QoS云资源调度方法[J]. 通信学报, 2019, 40(6): 102-115. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||
|