当前位置：文档之家› 一种基于特征聚类的特征选择方法

一种基于特征聚类的特征选择方法

王连喜;蒋盛益

【期刊名称】《计算机应用研究》

【年(卷),期】2015(000)005

【摘要】特征选择是数据挖掘和机器学习领域中一种常用的数据预处理技术。在无监督学习环境下，定义了一种特征平均相关度的度量方法，并在此基础上提出了一种基于特征聚类的特征选择方法 FSFC。该方法利用聚类算法在不同子空间中搜索簇群，使具有较强依赖关系（存在冗余性）的特征被划分到同一个簇群中，然后从每一个簇群中挑选具有代表性的子集共同构成特征子集，最终达到去除不相关特征和冗余特征的目的。在UCI 数据集上的实验结果表明，FSFC 方法与几种经典的有监督特征选择方法具有相当的特征约减效果和分类性能。%Feature selection has become a very useful pre-processing technology in data mining and machine learning.This paper proposed a mean-similarity measure and a new feature selection method based on feature clustering (named FSFC)in the unsupervised learning.Firstly,the method divided the entire feature space into a set of homogeneous subspaces when a clustering algorithm was used for the full feature set.Then it formed the final feature set by selecting some representative fea-tures from each cluster.At last,it removed the irrelevant and redundant features.Experimental results on UCI datasets show that the performance of dimensionality reduction and classification with C4.5 and naive Bayes obtained by FSFC is close to the several states of art