电信科学 ›› 2021, Vol. 37 ›› Issue (5): 133-147.doi: 10.11959/j.issn.1000-0801.2021025
刘国庆, 王兴起, 魏丹, 方景龙, 邵艳利
修回日期:
2021-01-18
出版日期:
2021-05-20
发布日期:
2021-05-01
作者简介:
刘国庆(1995− ),男,杭州电子科技大学计算机学院硕士生,主要研究方向为软件测试、机器学习。基金资助:
Guoqing LIU, Xingqi WANG, Dan WEI, Jinglong FANG, Yanli SHAO
Revised:
2021-01-18
Online:
2021-05-20
Published:
2021-05-01
Supported by:
摘要:
针对传统特征选择方法仅考虑变量间的线性关系而忽略非线性相关性,导致软件缺陷数目预测模型的性能较低的问题,提出了一种基于最大信息系数的特征选择方法。该方法考虑特征与特征以及特征与缺陷数目间的线性及非线性关系,将特征的冗余性分析和相关性分析分离为两个阶段。在冗余特征分析阶段,基于特征间的相关度,采用凝聚层次聚类算法将冗余特征分到同一簇中;在相关性分析阶段,依据特征与软件缺陷数目之间的相关度,对每个特征簇中的特征进行排序,然后从簇中选择排名靠前的特征组成特征子集。实验结果表明,该方法能够选择有效的特征子集,提高软件缺陷数目预测模型的预测性能。
中图分类号:
刘国庆, 王兴起, 魏丹, 方景龙, 邵艳利. 基于最大信息系数的软件缺陷数目预测特征选择方法[J]. 电信科学, 2021, 37(5): 133-147.
Guoqing LIU, Xingqi WANG, Dan WEI, Jinglong FANG, Yanli SHAO. Feature selection method for software defect number prediction based on maximum information coefficient[J]. Telecommunications Science, 2021, 37(5): 133-147.
表1
数据集特征"
project | Release | #Instance | #Defects | %Defects | Max |
ant | ant-1.3 | 125 | 33 | 16% | 3 |
ant-1.4 | 178 | 47 | 22.50% | 3 | |
ant-1.5 | 293 | 35 | 10.90% | 2 | |
ant-1.6 | 351 | 184 | 26.20% | 10 | |
ant-1.7 | 754 | 338 | 22.30% | 10 | |
camel | camel-1.0 | 339 | 14 | 3.80% | 2 |
camel-1.2 | 608 | 522 | 35.50% | 28 | |
camel-1.4 | 872 | 335 | 16.60% | 17 | |
camel-1.6 | 965 | 500 | 19.50% | 28 | |
xerces | xerces-1.2 | 453 | 193 | 15.20% | 30 |
xerces-1.3 | 588 | 1 596 | 74.30% | 62 | |
ivy | ivy-1.1 | 111 | 233 | 56.80% | 36 |
ivy-1.4 | 241 | 18 | 6.70% | 3 | |
ivy-2.0 | 352 | 56 | 11.40% | 3 |
表2
贝叶斯岭回归模型的AAE"
full | ReliefF | info | chi2 | HFSNFP | FSCR | FSDNP | MIC-TSFS | |
ant-1.3 | 0.301 3 | 0.287 2 | 0.278 8 | 0.301 9 | 0.278 8 | 0.277 6 | 0.302 6 | 0.270 5 |
ant-1.4 | 0.281 4 | 0.281 7 | 0.275 8 | 0.287 3 | 0.276 1 | 0.270 3 | 0.264 7 | 0.270 3 |
ant-1.5 | 0.115 7 | 0.123 1 | 0.119 8 | 0.119 3 | 0.112 9 | 0.119 7 | 0.133 6 | 0.116 2 |
ant-1.6 | 0.481 7 | 0.447 5 | 0.641 5 | 0.461 8 | 0.533 2 | 0.461 7 | 0.464 2 | 0.456 0 |
ant-1.7 | 0.370 5 | 0.598 5 | 0.551 8 | 0.398 7 | 0.463 1 | 0.459 1 | 0.422 8 | 0.398 7 |
camel-1.0 | 0.044 2 | 0.044 2 | 0.044 2 | 0.044 2 | 0.044 2 | 0.044 2 | 0.044 2 | 0.044 2 |
camel-1.2 | 1.038 9 | 1.060 6 | 1.163 9 | 1.075 0 | 1.091 4 | 1.040 4 | 1.048 8 | 1.029 0 |
camel-1.4 | 0.477 1 | 0.506 9 | 0.522 9 | 0.463 3 | 0.482 8 | 0.494 3 | 0.478 3 | 0.466 7 |
camel-1.6 | 0.704 4 | 0.703 2 | 0.696 0 | 0.713 7 | 0.695 9 | 0.716 5 | 0.700 3 | 0.688 8 |
xerces-1.2 | 0.281 8 | 0.261 4 | 0.263 6 | 0.281 8 | 0.270 5 | 0.268 2 | 0.275 0 | 0.277 3 |
xerces-1.3 | 0.545 1 | 0.544 5 | 0.575 6 | 0.566 8 | 0.539 9 | 0.535 8 | 0.597 3 | 0.538 3 |
ivy-1.1 | 1.312 1 | 1.376 5 | 1.896 2 | 1.312 9 | 1.372 0 | 1.840 2 | 1.312 9 | 1.248 5 |
ivy-1.4 | 0.082 7 | 0.070 2 | 0.082 7 | 0.082 7 | 0.078 5 | 0.082 7 | 0.082 7 | 0.091 0 |
ivy-2.0 | 0.173 2 | 0.176 2 | 0.158 8 | 0.170 5 | 0.178 8 | 0.164 7 | 0.167 5 | 0.161 8 |
mean | 0.443 6 | 0.463 0 | 0.519 4 | 0.448 6 | 0.458 4 | 0.483 2 | 0.449 6 | |
Win/Draw/Loss | 10/1/3 | 10/1/3 | 10/1/3 | 10/2/2 | 10/1/3 | 9/2/3 | 9/1/4 |
表3
贝叶斯岭归模型的ARE"
full | ReliefF | info | chi2 | HFSNFP | FSCR | FSDNP | MIC-TSFS | |
ant-1.3 | 0.172 8 | 0.152 1 | 0.139 6 | 0.172 3 | 0.149 5 | 0.137 7 | 0.163 3 | 0.135 4 |
ant-1.4 | 0.135 1 | 0.153 2 | 0.129 6 | 0.146 7 | 0.144 7 | 0.144 7 | 0.118 5 | 0.143 2 |
ant-1.5 | 0.072 5 | 0.069 5 | 0.056 4 | 0.073 7 | 0.066 0 | 0.067 8 | 0.072 5 | 0.066 0 |
ant-1.6 | 0.282 9 | 0.270 2 | 0.359 2 | 0.272 0 | 0.317 3 | 0.269 7 | 0.276 6 | 0.273 3 |
ant-1.7 | 0.204 3 | 0.383 4 | 0.275 1 | 0.225 9 | 0.286 9 | 0.284 4 | 0.253 5 | 0.226 8 |
camel-1.0 | 0.022 6 | 0.022 6 | 0.022 6 | 0.022 6 | 0.022 6 | 0.022 6 | 0.022 6 | 0.022 6 |
camel-1.2 | 0.619 4 | 0.650 3 | 0.767 0 | 0.655 4 | 0.677 7 | 0.635 2 | 0.624 1 | 0.597 7 |
camel-1.4 | 0.255 2 | 0.287 6 | 0.280 3 | 0.240 2 | 0.272 6 | 0.287 2 | 0.265 1 | 0.243 2 |
camel-1.6 | 0.396 2 | 0.375 6 | 0.369 1 | 0.388 4 | 0.370 7 | 0.387 6 | 0.387 8 | 0.368 3 |
xerces-1.2 | 0.122 5 | 0.096 4 | 0.098 7 | 0.122 5 | 0.105 5 | 0.107 7 | 0.112 7 | 0.118 0 |
xerces-1.3 | 0.313 3 | 0.296 4 | 0.301 8 | 0.317 1 | 0.298 5 | 0.269 7 | 0.365 4 | 0.314 9 |
ivy-1.1 | 0.657 1 | 0.693 5 | 0.883 9 | 0.665 1 | 0.610 6 | 0.844 5 | 0.665 1 | 0.631 6 |
ivy-1.4 | 0.042 4 | 0.033 0 | 0.045 5 | 0.042 4 | 0.038 2 | 0.042 4 | 0.042 4 | 0.050 7 |
ivy-2.0 | 0.099 8 | 0.099 1 | 0.063 4 | 0.104 3 | 0.103 8 | 0.092 0 | 0.095 9 | 0.092 6 |
mean | 0.242 6 | 0.255 9 | 0.270 9 | 0.246 3 | 0.247 5 | 0.255 8 | 0.247 5 | |
Win/Draw/Loss | 9/1/4 | 9/1/4 | 7/1/6 | 9/1/4 | 8/2/4 | 8/1/5 | 10/1/3 |
表4
线性回归模型的AAE"
full | ReliefF | info | chi2 | HFSNFP | FSCR | FSDNP | MIC-TSFS | |
ant-1.3 | 0.310 3 | 0.304 5 | 0.310 9 | 0.326 9 | 0.280 1 | 0.313 2 | 0.255 1 | 0.303 8 |
ant-1.4 | 0.314 7 | 0.286 9 | 0.275 8 | 0.264 7 | 0.275 8 | 0.264 1 | 0.264 7 | 0.297 4 |
ant-1.5 | 0.109 1 | 0.136 7 | 0.123 1 | 0.119 8 | 0.109 4 | 0.126 4 | 0.130 1 | 0.122 8 |
ant-1.6 | 0.484 4 | 0.459 0 | 0.737 9 | 0.635 9 | 0.547 4 | 0.464 4 | 0.469 9 | 0.458 7 |
ant-1.7 | 0.398 8 | 0.595 7 | 0.585 3 | 0.598 6 | 0.449 7 | 0.448 3 | 0.421 5 | 0.418 9 |
camel-1.0 | 0.044 2 | 0.044 2 | 0.044 2 | 0.044 2 | 0.044 2 | 0.044 2 | 0.044 2 | 0.044 2 |
camel-1.2 | 1.034 0 | 1.047 5 | 1.165 6 | 1.070 1 | 1.088 0 | 1.042 1 | 1.057 0 | 1.065 2 |
camel-1.4 | 0.496 7 | 0.506 9 | 0.544 7 | 0.596 3 | 0.495 0 | 0.508 6 | 0.515 1 | 0.494 4 |
camel-1.6 | 0.708 5 | 0.705 3 | 0.718 1 | 0.811 0 | 0.708 4 | 0.714 4 | 0.704 5 | 0.700 2 |
xerces-1.2 | 0.309 1 | 0.261 4 | 0.259 1 | 0.261 4 | 0.270 5 | 0.275 0 | 0.272 7 | 0.302 3 |
xerces-1.3 | 0.574 0 | 0.524 4 | 0.560 2 | 0.608 4 | 0.544 2 | 0.555 6 | 0.630 8 | 0.502 8 |
ivy-1.1 | 1.548 5 | 1.331 4 | 1.976 5 | 2.066 7 | 1.354 5 | 1.547 7 | 1.356 8 | 1.321 2 |
ivy-1.4 | 0.078 5 | 0.070 2 | 0.078 5 | 0.074 3 | 0.078 5 | 0.074 3 | 0.082 7 | 0.091 0 |
ivy-2.0 | 0.170 3 | 0.170 3 | 0.170 2 | 0.181 7 | 0.178 7 | 0.167 5 | 0.184 6 | 0.170 3 |
mean | 0.470 1 | 0.460 3 | 0.539 3 | 0.547 1 | 0.458 9 | 0.467 6 | 0.456 4 | |
Win/Draw/Loss | 8/2/4 | 8/2/4 | 9/1/4 | 9/1/4 | 8/1/5 | 8/1/5 | 8/1/5 |
表5
线性回归模型的ARE"
full | ReliefF | info | chi2 | HFSNFP | FSCR | FSDNP | MIC-TSFS | |
ant-1.3 | 0.185 2 | 0.179 0 | 0.157 7 | 0.182 1 | 0.158 5 | 0.176 0 | 0.134 8 | 0.178 2 |
ant-1.4 | 0.201 8 | 0.162 8 | 0.132 4 | 0.118 5 | 0.147 3 | 0.151 3 | 0.118 5 | 0.187 4 |
ant-1.5 | 0.069 2 | 0.081 3 | 0.089 8 | 0.056 4 | 0.061 0 | 0.082 9 | 0.071 4 | 0.076 0 |
ant-1.6 | 0.286 3 | 0.273 4 | 0.490 1 | 0.412 9 | 0.329 2 | 0.274 4 | 0.279 9 | 0.269 1 |
ant-1.7 | 0.220 6 | 0.386 5 | 0.327 2 | 0.392 1 | 0.271 3 | 0.274 4 | 0.250 8 | 0.247 1 |
camel-1.0 | 0.022 6 | 0.022 6 | 0.022 6 | 0.022 6 | 0.022 6 | 0.022 6 | 0.022 6 | 0.022 6 |
camel-1.2 | 0.620 2 | 0.6369 | 0.747 2 | 0.634 4 | 0.677 7 | 0.632 2 | 0.689 7 | 0.631 6 |
camel-1.4 | 0.277 0 | 0.288 1 | 0.306 6 | 0.381 8 | 0.296 2 | 0.282 5 | 0.284 0 | 0.276 7 |
camel-1.6 | 0.412 5 | 0.379 7 | 0.395 7 | 0.497 4 | 0.389 2 | 0.390 7 | 0.393 0 | 0.386 1 |
xerces-1.2 | 0.150 9 | 0.096 4 | 0.098 3 | 0.096 4 | 0.105 5 | 0.114 5 | 0.116 4 | 0.145 6 |
xerces-1.3 | 0.345 0 | 0.266 6 | 0.293 6 | 0.358 5 | 0.299 0 | 0.289 0 | 0.403 7 | 0.298 8 |
ivy-1.1 | 0.695 0 | 0.624 8 | 0.922 0 | 0.997 1 | 0.603 8 | 0.788 9 | 0.674 1 | 0.610 4 |
ivy-1.4 | 0.042 4 | 0.033 0 | 0.041 3 | 0.034 0 | 0.038 2 | 0.041 3 | 0.042 4 | 0.052 8 |
ivy-2.0 | 0.102 8 | 0.102 8 | 0.074 8 | 0.116 2 | 0.104 0 | 0.094 8 | 0.093 1 | 0.102 8 |
mean | 0.259 4 | 0.252 4 | 0.292 8 | 0.307 2 | 0.250 3 | 0.258 3 | 0.251 7 | |
Win/Draw/Loss | 8/2/4 | 7/2/5 | 7/1/6 | 9/1/4 | 7/1/6 | 7/1/6 | 7/1/6 |
表6
决策树回归模型的AAE"
full | ReliefF | info | chi2 | HFSNFP | FSCR | FSDNP | MIC-TSFS | |
ant-1.3 | 0.337 4 | 0.328 8 | 0.382 3 | 0.349 1 | 0.336 5 | 0.340 9 | 0.280 8 | 0.320 5 |
ant-1.4 | 0.367 1 | 0.314 4 | 0.363 2 | 0.308 5 | 0.336 9 | 0.319 9 | 0.376 1 | 0.352 0 |
ant-1.5 | 0.112 5 | 0.129 7 | 0.109 2 | 0.150 9 | 0.136 4 | 0.129 7 | 0.157 0 | 0.119 4 |
ant-1.6 | 0.488 8 | 0.495 6 | 0.470 2 | 0.639 4 | 0.524 2 | 0.495 6 | 0.470 5 | 0.452 9 |
ant-1.7 | 0.459 7 | 0.537 8 | 0.470 1 | 0.653 6 | 0.447 0 | 0.435 7 | 0.422 4 | 0.437 8 |
camel-1.0 | 0.0413 | 0.041 3 | 0.041 3 | 0.041 3 | 0.041 3 | 0.041 3 | 0.041 3 | 0.041 3 |
camel-1.2 | 1.027 5 | 1.098 1 | 1.064 3 | 1.157 3 | 1.147 3 | 1.079 9 | 1.121 0 | 1.020 7 |
camel-1.4 | 0.466 7 | 0.435 2 | 0.480 5 | 0.526 3 | 0.441 5 | 0.466 5 | 0.469 2 | 0.462 9 |
camel-1.6 | 0.689 6 | 0.663 9 | 0.694 3 | 0.694 9 | 0.658 0 | 0.661 4 | 0.691 7 | 0.731 6 |
xerces-1.2 | 0.313 6 | 0.322 7 | 0.306 8 | 0.306 8 | 0.311 4 | 0.356 4 | 0.350 0 | 0.354 5 |
xerces-1.3 | 0.463 3 | 0.483 7 | 0.558 3 | 0.528 4 | 0.526 4 | 0.524 1 | 0.464 8 | 0.464 3 |
ivy-1.1 | 1.973 8 | 2.012 7 | 1.972 7 | 2.048 5 | 1.905 3 | 2.054 5 | 2.045 5 | 1.891 7 |
ivy-1.4 | 0.111 8 | 0.078 3 | 0.103 5 | 0.103 5 | 0.099 2 | 0.103 5 | 0.103 5 | 0.103 5 |
ivy-2.0 | 0.204 3 | 0.202 9 | 0.187 3 | 0.187 9 | 0.210 4 | 0.195 9 | 0.190 2 | 0.198 9 |
mean | 0.504 1 | 0.510 4 | 0.514 6 | 0.549 7 | 0.508 7 | 0.514 7 | 0.513 1 | 0.496 6 |
Win/Draw/Loss | 9/1/4 | 9/1/4 | 8/2/4 | 8/2/4 | 8/1/5 | 8/2/4 | 7/2/5 |
表7
决策树回归模型的ARE"
full | ReliefF | info | chi2 | HFSNFP | FSCR | FSDNP MIC-TSFS | |
ant-1.3 | 0.201 4 | 0.199 4 | 0.254 4 | 0.222 5 | 0.219 1 | 0.203 8 | 0.159 00.194 1 |
ant-1.4 | 0.248 2 | 0.195 8 | 0.235 3 | 0.191 3 | 0.230 7 | 0.206 9 | 0.247 20.230 7 |
ant-1.5 | 0.062 9 | 0.086 9 | 0.061 3 | 0.089 2 | 0.087 4 | 0.086 9 | 0.097 10.071 6 |
ant-1.6 | 0.255 2 | 0.281 3 | 0.249 3 | 0.366 6 | 0.284 7 | 0.287 0 | 0.270 70.223 6 |
ant-1.7 | 0.246 7 | 0.297 9 | 0.257 3 | 0.412 0 | 0.244 2 | 0.239 4 | 0.225 40.224 4 |
camel-1.0 | 0.019 7 | 0.019 7 | 0.019 7 | 0.019 7 | 0.019 7 | 0.019 7 | 0.019 70.019 7 |
camel-1.2 | 0.588 9 | 0.663 8 | 0.596 9 | 0.732 5 | 0.709 6 | 0.650 3 | 0.684 40.590 8 |
camel-1.4 | 0.255 1 | 0.217 2 | 0.253 2 | 0.280 7 | 0.226 1 | 0.244 4 | 0.256 80.233 5 |
camel-1.6 | 0.343 3 | 0.315 4 | 0.354 9 | 0.358 9 | 0.310 3 | 0.322 5 | 0.338 80.383 5 |
xerces-1.2 | 0.194 8 | 0.191 1 | 0.173 6 | 0.155 2 | 0.183 9 | 0.199 8 | 0.207 80.219 8 |
xerces-1.3 | 0.244 9 | 0.243 1 | 0.296 9 | 0.253 0 | 0.280 6 | 0.262 7 | 0.230 40.242 6 |
ivy-1.1 | 0.805 8 | 0.789 8 | 0.791 2 | 0.841 9 | 0.781 9 | 0.795 0 | 0.855 60.742 5 |
ivy-1.4 | 0.071 5 | 0.038 0 | 0.063 2 | 0.063 2 | 0.058 9 | 0.063 2 | 0.063 20.063 2 |
ivy-2.0 | 0.132 9 | 0.133 9 | 0.110 9 | 0.106 5 | 0.131 7 | 0.124 3 | 0.112 50.126 3 |
mean | 0.262 2 | 0.262 4 | 0.265 6 | 0.292 4 | 0.269 2 | 0.264 7 | 0.269 20.254 7 |
Win/Draw/Loss | 9/1/4 | 8/1/5 | 9/2/3 | 8/2/4 | 8/2/4 | 8/2/4 | 7/2/5 |
表8
贝叶斯岭回归模型的G-mean"
full | ReliefF | info | chi2 | HFSNFP | FSCR | FSDNP | MIC-TSFS | |
ant-1.3 | 0.460 8 | 0.509 9 | 0.430 0 | 0.535 4 | 0.601 0 | 0.393 9 | 0.486 8 | 0.460 6 |
ant-1.4 | 0.183 2 | 0.259 1 | 0.259 1 | 0.172 5 | 0.213 1 | 0.330 7 | 0.235 0 | 0.237 2 |
ant-1.5 | 0.324 2 | 0.251 0 | 0.251 0 | 0.433 6 | 0.274 0 | 0.267 9 | 0.057 7 | 0.312 7 |
ant-1.6 | 0.778 2 | 0.781 5 | 0.792 5 | 0.784 5 | 0.730 9 | 0.771 6 | 0.773 5 | 0.787 4 |
ant-1.7 | 0.769 4 | 0.666 8 | 0.653 1 | 0.731 4 | 0.737 9 | 0.739 0 | 0.750 9 | 0.746 5 |
camel-1.0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
camel-1.2 | 0.539 8 | 0.426 6 | 0.521 2 | 0.468 3 | 0.291 7 | 0.390 7 | 0.527 3 | 0.538 8 |
camel-1.4 | 0.588 8 | 0.567 5 | 0.606 8 | 0.572 8 | 0.515 9 | 0.627 5 | 0.579 2 | 0.568 9 |
camel-1.6 | 0.598 5 | 0.579 0 | 0.572 6 | 0.594 9 | 0.564 1 | 0.454 4 | 0.596 8 | 0.599 3 |
xerces-1.2 | 0.123 9 | 0.140 0 | 0.058 0 | 0.123 9 | 0.083 0 | 0.116 3 | 0.074 4 | 0.124 5 |
xerces-1.3 | 0.636 4 | 0.757 6 | 0.6052 | 0.539 2 | 0.759 1 | 0.641 4 | 0.564 3 | 0.647 5 |
ivy-1.1 | 0.333 2 | 0.305 3 | 0.396 7 | 0.292 3 | 0.477 5 | 0.323 6 | 0.292 3 | 0.514 9 |
ivy-1.4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
ivy-2.0 | 0.372 6 | 0.364 1 | 0.345 2 | 0.405 4 | 0.311 3 | 0.300 4 | 0.385 8 | 0.348 5 |
mean | 0.407 8 | 0.400 6 | 0.392 2 | 0.403 9 | 0.397 1 | 0.382 7 | 0.380 3 | 0.420 5 |
表9
线性回归模型的G-mean"
full | ReliefF | info | chi2 | HFSNFP | FSCR | FSDNP | MIC-TSFS | |
ant-1.3 | 0.487 0 | 0.626 2 | 0.297 8 | 0.373 8 | 0.769 9 | 0.493 9 | 0.653 7 | 0.532 0 |
ant-1.4 | 0.397 3 | 0.290 8 | 0.057 7 | 0.407 7 | 0.261 2 | 0.412 2 | 0 | 0.425 9 |
ant-1.5 | 0.319 1 | 0.197 6 | 0.496 1 | 0.422 7 | 0.348 7 | 0.239 2 | 0.098 6 | 0.376 7 |
ant-1.6 | 0.753 9 | 0.782 9 | 0.594 0 | 0.689 2 | 0.720 5 | 0.769 9 | 0.773 5 | 0.784 7 |
ant-1.7 | 0.633 5 | 0.6765 | 0.552 9 | 0.691 3 | 0.737 6 | 0.748 3 | 0.752 8 | 0.739 2 |
camel-1.0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
camel-1.2 | 0.548 4 | 0.488 8 | 0.284 3 | 0.446 7 | 0.373 6 | 0.444 9 | 0.525 7 | 0.529 3 |
camel-1.4 | 0.543 1 | 0.584 7 | 0.517 5 | 0.606 5 | 0.637 7 | 0.636 7 | 0.589 6 | 0.6 |
camel-1.6 | 0.626 2 | 0.587 0 | 0.578 8 | 0.559 1 | 0.583 4 | 0.558 6 | 0.593 4 | 0.602 5 |
xerces-1.2 | 0.176 0 | 0.110 2 | 0.113 6 | 0.112 2 | 0.123 7 | 0.115 7 | 0.187 1 | 0.188 5 |
xerces-1.3 | 0.635 2 | 0.726 4 | 0.522 6 | 0.705 0 | 0.749 2 | 0.624 5 | 0.647 9 | 0.662 8 |
ivy-1.1 | 0.438 6 | 0.614 9 | 0.589 9 | 0.507 1 | 0.652 0 | 0.518 9 | 0.484 3 | 0.637 9 |
ivy-1.4 | 0.168 5 | 0.070 7 | 0.070 7 | 0 | 0 | 0.070 7 | 0 | 0.070 7 |
ivy-2.0 | 0.409 4 | 0.318 4 | 0.328 9 | 0.450 7 | 0.277 0 | 0.499 0 | 0.385 8 | 0.313 6 |
mean | 0.438 3 | 0.433 9 | 0.357 5 | 0.426 6 | 0.445 3 | 0.438 0 | 0.406 6 | 0.461 7 |
表10
决策树回归模型的G-mean"
full | ReliefF | info | chi2 | HFSNFP | FSCR | FSDNP | MIC-TSFS | |
ant-1.3 | 0.545 7 | 0.491 8 | 0.533 0 | 0.536 2 | 0.511 5 | 0.425 5 | 0.573 8 | 0.547 5 |
ant-1.4 | 0.272 5 | 0.369 3 | 0.320 4 | 0.235 1 | 0.284 7 | 0.352 4 | 0.214 5 | 0.292 9 |
ant-1.5 | 0.360 4 | 0.292 6 | 0.055 6 | 0.347 6 | 0.319 4 | 0.292 6 | 0.112 2 | 0.330 0 |
ant-1.6 | 0.727 7 | 0.738 1 | 0.574 8 | 0.668 9 | 0.683 8 | 0.733 1 | 0.739 9 | 0.729 2 |
ant-1.7 | 0.676 8 | 0.636 8 | 0.573 7 | 0.652 2 | 0.690 5 | 0.672 1 | 0.669 7 | 0.662 5 |
camel-1.0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
camel-1.2 | 0.532 0 | 0.497 1 | 0.463 2 | 0.490 5 | 0.473 1 | 0.528 5 | 0.475 6 | 0.517 8 |
camel-1.4 | 0.583 1 | 0.533 4 | 0.401 1 | 0.492 4 | 0.552 5 | 0.379 9 | 0.530 5 | 0.497 9 |
camel-1.6 | 0.474 2 | 0.484 5 | 0.5 | 0.505 3 | 0.512 8 | 0.508 5 | 0.453 9 | 0.486 3 |
xerces-1.2 | 0.372 7 | 0.389 3 | 0.256 7 | 0.503 1 | 0.543 4 | 0.432 1 | 0.409 3 | 0.46 9 |
xerces-1.3 | 0.452 1 | 0.408 8 | 0.577 3 | 0.590 2 | 0.454 8 | 0.601 9 | 0.593 6 | 0.592 6 |
ivy-1.1 | 0.515 5 | 0.582 6 | 0.600 7 | 0.536 9 | 0.499 9 | 0.490 2 | 0.553 3 | 0.597 6 |
ivy-1.4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
ivy-2.0 | 0.281 8 | 0.402 6 | 0.257 9 | 0.289 0 | 0.264 0 | 0.412 6 | 0.365 7 | 0.290 4 |
mean | 0.413 9 | 0.416 2 | 0.365 3 | 0.417 7 | 0.413 6 | 0.416 4 | 0.406 6 | 0.429 6 |
[1] | 宫丽娜, 姜淑娟, 姜丽 . 软件缺陷预测技术研究进展[J]. 软件学报, 2019,30(10): 3090-3114. |
GONG L N , JIANG S J , JIANG L . Research progress of software defect prediction[J]. Journal of Software, 2019,30(10): 3090-3114. | |
[2] | 刘望舒, 陈翔, 顾庆 ,等. 软件缺陷预测中基于聚类分析的特征选择方法[J]. 中国科学: 信息科学, 2016,46(9): 1298. |
LIU W S , CHEN X , GU Q ,et al. A cluster-analysis-based feature-selection method for software defect prediction[J]. SCIENTIA SINICA Informationis, 2016,46(9): 1298. | |
[3] | GRAVES T L , KARR A F , MARRON J S ,et al. Predicting fault incidence using software change history[J]. IEEE Transactions on Software Engineering, 2000,26(7): 653-661. |
[4] | OSTRAND T J , WEYUKER E J , BELL R M . Predicting the location and number of faults in large software systems[J]. IEEE Transactions on Software Engineering, 2005,31(4): 340-355. |
[5] | CHEN M , MA Y . An empirical study on predicting defect numbers[C]// Proceeding of the 27th International Conference on Software Engineering and Knowledge Engineering.[S.l.:s.n.], 2015: 397-402. |
[6] | RATHORE S S , KUMAR S . Predicting number of faults in software system using genetic programming[J]. Procedia Computer Science, 2015: 303-311. |
[7] | RATHORE S S , KUMAR S . A decision tree regression based approach for the number of software faults prediction[J]. ACM Sigsoft Software Engineering Notes, 2016,41(1): 1-6. |
[8] | RATHORE S S , KUMAR S . An empirical study of some software fault prediction techniques for the number of faults prediction[J]. Soft Computing, 2017,21(24): 7417-7434. |
[9] | CHEN X , ZHANG D , ZHAO Y ,et al. Software defect number prediction: unsupervised vs supervised methods[J]. Information and Software Technology, 2019(106): 161-181. |
[10] | RATHORE S S , KUMAR S . Towards an ensemble based system for predicting the number of software faults[J]. Expert Systems with Applications, 2017(82): 357-382. |
[11] | RATHORE S S , KUMAR S . Linear and non-linear heterogeneous ensemble methods to predict the number of faults in software systems[J]. Knowledge-Based Systems, 2017(119): 232-256. |
[12] | YU X , LIU J , YANG Z ,et al. Learning from imbalanced data for predicting the number of software defects[C]// Proceeding of the International Symposium on Software Reliability Engineering.[S.l.:s.n.], 2017: 78-89. |
[13] | 刘洺辛, 陈晶, 王麒媛 . 基于改进特征选择方法的文本情感分类研究[J]. 电信科学, 2018,34(10): 85-95. |
LIU M X , CHEN J , WANG Q Y . Research on text sentiment classification based on improved feature selection method[J]. Telecommunications Science, 2018,34(10): 85-95. | |
[14] | 李叶飞, 官国飞, 葛崇慧 . FSDNP: 针对软件缺陷数预测的特征选择方法[J]. 计算机工程与应用, 2019,55(14): 61-68. |
LI Y F , GUAN G F , GE C H . FSDNP: feature selection method for software defect number prediction[J]. Computer Engineering and Applications, 2019,55(14): 61-68. | |
[15] | YU X , MA Z , MA C ,et al. FSCR: a feature selection method for software defect prediction[C]// Proceeding of the 29th International Conference on Software Engineering and Knowledge Engineering.[S.l.:s.n.], 2017: 351-356. |
[16] | 马子逸, 马传香, 刘瑞奇 . 面向软件缺陷个数预测的混合式特征选择方法[J]. 计算机应用研究, 2018,35(2): 487-502. |
MA Z Y , MA C X , LIU R Q . Hybrid feature selection method for number of software faults prediction[J]. Application Research of Computers, 2018,35(2): 487-502. | |
[17] | RESHEF D N , RESHEF Y A , FINUCANE H K ,et al. Detecting novel associations in large data sets[J]. Science, 2011,334(6062): 1518-1524. |
[18] | GAO K , KHOSHGOFTAAR T M , WANG H ,et al. An empirical investigation of filter attribute selection techniques for software quality classification[C]// Proceeding of the 10th IEEE international conference on Information Reuse & Integration. Piscataway: IEEE Press, 2009: 272-277. |
[19] | JURECZKO M . Signifificance of different software metrics in defect prediinformation reuse and integrationction[J]. Software Engineering, 2011,1(1): 86-95. |
[1] | 林锋,徐柳婧,陈晓华,戚伟强,陈可,朱添田. 一种基于多视角特征融合的Webshell检测方法[J]. 电信科学, 2020, 36(6): 125-132. |
[2] | 章坚武,杨佳佳,吴震东. 一种基于指纹和声纹决策级融合识别方法[J]. 电信科学, 2018, 34(3): 32-40. |
[3] | 刘洺辛,陈晶,王麒媛. 基于改进特征选择方法的文本情感分类研究[J]. 电信科学, 2018, 34(10): 85-95. |
[4] | 张春琴,谢立春. 云环境中改进FCM和规则参数优化的网络入侵检测方法[J]. 电信科学, 2018, 34(1): 72-79. |
[5] | 许小媛,黄黎. 基于实例学习和协同子集搜索的特征选择方法[J]. 电信科学, 2017, 33(6): 105-113. |
[6] | 陈昊,卿斯汉. 基于组合式算法的Android恶意软件检测方法[J]. 电信科学, 2016, 32(10): 15-21. |
[7] | 边凌燕,贺仁龙,姚晓辉. 基于DPI数据挖掘实现URL分类挂载的相关技术研究[J]. 电信科学, 2013, 29(11): 6-11. |
[8] | 陆慧娟,张金伟,马小平,杨小兵. 基于特征选择的过抽样算法的研究[J]. 电信科学, 2012, 28(1): 87-91. |
[9] | 陆慧娟,张金伟,张金伟,张金伟,马小平,杨小兵. 基于特征选择的过抽样算法的研究[J]. 电信科学, 2012, 28(1): 91-95. |
[10] | 王海宁,孙守迁,吴剑锋. 基于混合智能优化算法的生理信号情感识别[J]. 电信科学, 2010, 26(9): 129-135. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||
|