Particle Swarm Optimization for Feature Selection and Weighting in High-Dimensional Clustering
Main Article Content
Abstract
Over the past decades, the rise of computers and database technologies has caused rapid growth in high-dimensional datasets. On the other hand, data is often described with numerous features, many of which may be unnecessary for a given data mining application, reducing the performance of machine learning algorithms. As such, using optimal feature selection methods is a must.
This article proposes a novel, improved version of the standard particle swarm optimization (PSO) algorithm enabled with crossover and mutation operators to enhance exploration and search capabilities. The proposed algorithm and feature clustering in the Hadoop framework are used to provide a new feature selection method. The final clusters are determined based on the graph structure of the features and their relationships. This approach allows for a more comprehensive feature relevance analysis in large datasets.
The proposed method is compared to two feature selection methods, namely GCPSO_Random and GCPSO_Score, as well as some new methods that use evolutionary algorithms in their feature selection process. Given their comprehensive features, the UCI-based datasets are used to evaluate the proposed method and for comparison purposes. The results unequivocally show that the proposed method outperforms other methods on most test datasets, providing comparable or higher classification accuracy and shorter execution time.