A Hybrid Feature Selection Optimization Model for High Dimension Data Classification

Qaraad M.
Amjad S.
Manhrawy I.I.M.
Fathi H.
Hassan B.A.
Kafrawy P.E.

Feature selection is an NP-hard combinatorial problem, in which the number of possible feature subsets increases exponentially with the number of features. In the case of large dimensionality, the goal of feature selection is to determine the smallest possible features considering the most informative subset. In this paper, we proposed a hybrid feature selection optimization model for Cancer Classification called, ENSVM. Our model is based on using the Elastic Net (EN) method that regulates and selects variables for gene selection of genomic microarray data. We applied three different optimization techniques namely Social Ski-Driver (SSD), Randomized SearchCV (RS) and Elastic NetCV (ENCV) for determining Elastic Net with traditional Support Vector Machines for classification. To evaluate the model, we compared the results of applying ENSVM to seven genomic microarray data with the SSD-SVM model and SVM with (RBF) kernel without any feature selection method. The results of the comparison revealed the effect of ENSVM in selecting the optimal feature subset that maximized the classification performance. Accordingly, minimizing the number of features is significant when analyzing high dimensional data for performance nevertheless accuracy. Moreover, the ENSVM model is superior compared with the SSD-SVM model. © 2013 IEEE.