大数据 ›› 2015, Vol. 1 ›› Issue (4): 9-17.doi: 10.11959/j.issn.2096-0271.2015036

• 专题:金融与安全大数据 •    下一篇

面向大数据的并行聚类算法在股票板块划分中的应用

海沫1,牛怡晗2,张悦今1   

  1. 1 中央财经大学信息学院 北京 100081
    2 上海浦东发展银行昆明分行 昆明 650000
  • 出版日期:2015-11-20 发布日期:2020-09-28
  • 作者简介:海沫,女,博士,中央财经大学信息学院副教授,CCF高级会员,主要研究领域为分布式系统、大数据处理和分析。|牛怡晗,女,就职于上海浦东发展银行昆明分行,主要研究领域为大数据分析。|张悦今,女,中央财经大学信息学院讲师,主要研究领域为数据挖掘及其应用、知识管理、互联网金融。
  • 基金资助:
    北京高等学校青年英才计划资助项目(YETP0988);2014年度中财121人才工程青年博士发展基金资助项目(QBJ1427)

Application of Parallel Clustering Algorithms for Big Data in the Division of Stock

Mo Hai1,Yihan Niu2,Yuejin Zhang1   

  1. 1 School of Information, Central University of Finance and Economics, Beijing 100081, China
    2 Kunming Branch, Shanghai Pudong Development Bank, Kunming 650000, China
  • Online:2015-11-20 Published:2020-09-28
  • Supported by:
    Beijing Higher Education Young Elite Teacher Project(YETP0988);121 of CUFE Talent Project Young Doctor Development Fund in 2014(QBJ1427)

摘要:

上市公司的经营业绩在一定程度上反映股票的投资价值,因此以反映上市公司盈利能力、偿债能力、成长能力、资产管理质量及股东获利能力5个方面共15项财务指标作为股票投资价值的衡量指标,首次尝试使用面向大数据的并行聚类算法Mahout中的K-means聚类算法和模糊K-means聚类算法对中国A股市场约2 600支股票依据其财务指标进行聚类,以便进行股票板块的划分,并比较两种算法在不同距离度量方式下的迭代次数、执行时间、聚类间密度和聚类内密度。实验结果表明,谷本距离度量方式下的K-means算法聚类效果最好,因此可将该实验结果作为最终股票板块划分结果进行分析,从而为投资决策提供参考。

关键词: 财务指标, 并行聚类算法, K-means, 模糊K-means, 股票板块划分

Abstract:

For the operating performance of listed corporations reflects the value of stock investment to a certain extent, financial index reflecting the operating performance of listed corporations was taken as the evaluation index of stock investment value, and for the first time the parallel clustering algorithms for big data both K-means and fuzzy K-means of Mahout were used to cluster nearly 2 600 stock of China’s A shares market according to their financial index, afterwards the clustering results of these two algorithms under different distance metrics were compared.Experimental results show that the clustering quality of K-means algorithm adopting Tanimoto distance metric is the best.Therefore, this result can be used as the final result of the division of stock, which can provide a reference for the investment decision.

Key words: inancial index, parallel clustering algorithm, K-means, fuzzy K-means, division of stock

中图分类号: 

No Suggested Reading articles found!