引用本文:韩敏,朱新荣.不平衡数据分类的混合算法[J].控制理论与应用,2011,28(10):1485~1489.[点击复制]
HAN Min,ZHU Xin-rong.Hybrid algorithm for classification of unbalanced datasets[J].Control Theory and Technology,2011,28(10):1485~1489.[点击复制]
不平衡数据分类的混合算法
Hybrid algorithm for classification of unbalanced datasets
摘要点击 3432  全文点击 2558  投稿时间:2010-06-25  修订日期:2010-11-02
查看全文  查看/发表评论  下载PDF阅读器
DOI编号  10.7641/j.issn.1000-8152.2011.10.CCTA100742
  2011,28(10):1485-1489
中文关键词  不平衡数据  随机森林  径向基函数神经网络  受试者特征曲线
英文关键词  imbalanced data  random forest  radial basis function neural network(RBFNN)  receiver operator characteristics(ROC)
基金项目  国家自然科学基金资助项目(61074096); 国家科技支撑计划资助项目(2006BAB14B05); 国家重点基础研究资助项目(2006CB403405).
作者单位E-mail
韩敏* 大连理工大学 电子信息与电气工程学部 minhan@dlut.edu.cn 
朱新荣 大连理工大学 电子信息与电气工程学部  
中文摘要
      针对传统分类算法处理不平衡数据时, 小类的分类精度过低问题, 提出一种径向基函数神经网络和随机森林集成的混合分类算法. 在小类样本之间用随机插值方式平衡数据集的分布, 利用受试者特征曲线在置信度为95%下的面积为标准去除冗余特征; 之后对输入数据用Bagging技术进行扰动, 并以径向基函数神经网络作为随机森林中的基分类器, 采用绝大多数投票方法进行决策的融合和输出. 将该算法应用于UCI数据, 以G均值和受试者特征曲线下的面积为评判标准, 结果表明该方法能够有效地提高中度和高度不平衡数据的分类精度.
英文摘要
      A novel hybrid algorithm of radial basis function neural network(RBFNN) integrated with the random forest algorithm is proposed to improve the poor classification result produced by traditional algorithm in classifying minor class of unbalanced datasets. Firstly, random interpolations are inserted between adjacent data in the minor dataset to balance the data distribution. Receiver operator characteristics(ROC) with degree of confidence less than 95% are considered the redundant characteristic and are deleted. The input data are perturbed by the Bagging technique. Radial Basis Function Neural Network is employed to be the basic classifier in the random forest. The fusion of decisions and the outputs are determined by the vast majority of votes. This method is applied to UCI dataset. The precision of G-mean and the area under the ROC demonstrate the improvement of the accuracy in the classifications of medium-size unbalanced and largesize unbalance class data sets.