智能科学与技术学报 ›› 2023, Vol. 5 ›› Issue (3): 330-342.doi: 10.11959/j.issn.2096-6652.202327

• 专栏:智能科技与社会计算 • 上一篇    下一篇

基于机器学习的GitHub企业影响力分析与预测

王明宇1, 宫庆媛2, 瞿晶晶3, 王新1   

  1. 1 复旦大学计算机科学技术学院,上海 200438
    2 复旦大学智能复杂体系基础理论与关键技术实验室,上海 200438
    3 上海人工智能实验室,上海 201210
  • 修回日期:2023-08-02 出版日期:2023-09-01 发布日期:2023-09-26
  • 作者简介:王明宇(1998-),女,2022年在格拉斯哥大学计算机科学学院获得硕士学位,2022—2023年在复旦大学计算机科学技术学院从事科研助理工作。主要研究方向为社交网络、机器学习和大数据分析
    宫庆媛(1991- ),女,博士,2020年在复旦大学计算机科学技术学院获得博士学位, 2020—2022年在复旦大学计算机学院继续从事博士后研究工作。2022年起在复旦大学智能复杂体系基础理论与关键技术实验室担任青年副研究员。主要研究方向为在线社交网络用户行为大数据
    瞿晶晶(1990- ),女,博士,上海人工智能实验室治理研究中心副研究员。主要研究方向为人工智能创新与治理、科技与社会、科学学
    王新(1973- ),男,博士,复旦大学计算机科学技术学院党委书记、教授、博士生导师,主要研究方向为新一代互联网体系结构、无线与移动网络、数据中心网络、社交网络、网络编码的应用
  • 基金资助:
    国家自然科学基金项目(62102094))

Analysis and prediction of GitHub company influence based on machine learning

Mingyu WANG1, Qingyuan GONG2, Jingjing QU3, Xin WANG1   

  1. 1 School of Computer Science, Fudan University, Shanghai 200438, China
    2 Research Institute of Intelligent Complex Systems, Fudan University, Shanghai 200438, China
    3 Shanghai Artificial Intelligent Laboratory, Shanghai 201210, China
  • Revised:2023-08-02 Online:2023-09-01 Published:2023-09-26
  • Supported by:
    The National Natural Science Foundation of China(62102094))

摘要:

企业影响力的高低不仅关系到其行业竞争力,也影响着其社会声誉和未来发展,然而对企业影响力的评价一直没有统一的标准。GitHub是一个代表性的软件开发代码存储库开源平台,现有研究通常使用企业在GitHub发布的项目得到的star总数衡量其影响力高低,但是这种方式难以衡量小微企业和新生企业的潜力。通过引入科学家的影响力衡量指标h指数,以GitHub为信息源进行企业网络建模,同时基于该网络提取特征构建分类器,对企业未来的影响力水平进行预测。在此基础上应用SHAP模型解释技术,判别决定企业影响力的重要特征。实验结果显示,基于XGBoost的模型在GitHub真实数据集上实现了0.92的准确率和0.93的平均AUC,可以准确、可靠地对企业进行影响力预测。

关键词: 在线开发者社区, 社交网络, 机器学习, SHAP

Abstract:

The influence of a company is not only related to its industry competitiveness, but also affects its public reputation and future development.However, there has been no unified standard for evaluating the influence of a company.GitHub is a representative open-source platform for software development code repositories.Existing research typically used the total number of stars a company receives for projects posted on GitHub to measure its influence, but this approach is difficult to measure the potential of small, micro, and nascent companies.The paper predicted the future influence level of a company by introducing the scientist's influence measure h-index, using GitHub as the information source, and modeling the company network.Features was extracted features based on this network to build the classifier, which predicted the future influence level of the company.The SHAP model explanation technique was further applied on this basis to identify the important features that determined the influence of a company.The experimental results showed that the XGBoost model achieved an accuracy of 0.92 and an average AUC of 0.93 on the real-world GitHub dataset.In summary, the proposed method could accurately and reliably predict the influence of companies.

Key words: online developer community, social network, machine learning, SHAP

中图分类号: 

No Suggested Reading articles found!