Postgraduate Projects

Chameleon: A Python Workflow Toolkit for Feature Selection

When considering classification problems in relation to high-dimensional data sets, such as biological data sets, the need for effective methods of dimensionality reduction by feature selection becomes apparent. Feature selection has been shown to significantly decrease computational cost and allow for classification models that are more easily interpretable. The project presented is Chameleon, a Python-based toolkit that integrates all steps in a feature selection evaluation pipeline – from splitting data for cross-validation, to visualisation of classification results using various metrics. The toolkit is unique in that it streamlines the process of feature selection for classification and evaluates the classification performances. Implemented in the Chameleon toolkit are six existing feature selection methods, six common classification methods, and 2 classification performance metrics. An ensemble method of feature selection was also implemented which selects only common features from the different methods evaluated. The toolkit was tested using four different high-dimensional data sets, with the common features method achieving improved or similar classification performance compared to the individual feature selection algorithms, using smaller and thus more computationally efficient subsets of features.  

Developed By: 
Diviya Thilakeswaran