Comparison of unsupervised feature selection methods for high-dimensional regression problems in prediction of peptide binding affinity

Sarac, Ferdi, Uslan, Volkan, Seker, Huseyin and Bouridane, Ahmed (2015) Comparison of unsupervised feature selection methods for high-dimensional regression problems in prediction of peptide binding affinity. Proceedings of the 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC 2015). pp. 8173-8176. ISSN 1557-170X

Full text not available from this repository.

Official URL: http://dx.doi.org/10.1109/EMBC.2015.7320291

Abstract

Identification of robust set of predictive features is one of the most important steps in the construction of clustering, classification and regression models from many thousands of features. Although there have been various attempts to select predictive feature sets from high-dimensional data sets in classification and clustering, there is a limited attempt to study it in regression problems. As semi-supervised and supervised feature selection methods tend to identify noisy features in addition to discriminative variables, unsupervised feature selection methods (USFSMs) are generally regarded as more unbiased approach. Therefore, in this study, along with the entire feature set, four different USFSMs are considered for the quantitative prediction of peptide binding affinities being one of the most challenging post-genome regression problems of very high-dimension comparted to extremely small size of samples. As USFSMs are independent of any predictive method, support vector regression was then utilised to assess the quality of prediction. Given three different peptide binding affinity data sets, the results suggest that the regression performance of USFMs depends generally on the datasets. There is no particular method that yields the best performance compared to their performances in the classification problems. However, a closer investigation of the results appears to suggest that the spectral regression-based approach yields slightly better performance. To the best of our knowledge, this is the first study that presents comprehensive comparison of USFSMs in such high-dimensional regression problems, particularly in biological domain with an application in the prediction of peptide binding affinity, and provides a number of practical suggestions for future practitioners.

Item Type:	Article
Uncontrolled Keywords:	biological domain, classification models, clustering models, discriminative variables
Subjects:	H900 Others in Engineering
Department:	Faculties > Engineering and Environment > Mathematics, Physics and Electrical Engineering
Depositing User:	Ay Okpokam
Date Deposited:	20 Jan 2016 16:52
Last Modified:	12 Oct 2019 22:54
URI:	http://nrl.northumbria.ac.uk/id/eprint/25577

Actions (login required)

View Item

Downloads

Downloads per month over past year

View more statistics