Abstract
High-dimensional data encountered in genomic and proteomic studies are often limited by the sample size but has a higher number of predictor variables. Therefore selecting the most relevant variables that are correlated with the outcome variable is a crucial step. This paper describes an approach for selecting a set of optimal variables to achieve a classification model with high predictive accuracy. The work described using a biological classifier published elsewhere but it can be generalized for any application.
Similar content being viewed by others
References
Furey, T. S., Cristianini, N., Duffy, N., Bednarski, D. W., Schummer, M., & Haussler, D. (2000). Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics, 16(10), 906–914.
Baldi, P., & Long, A. D. (2001). A Bayesian framework for the analysis of microarray expression data: Regularized \(t\)-test and statistical inferences of gene changes. Bioinformatics, 17(6), 509–519.
Geman, D., d’Avignon, C., Naiman, D. Q., & Winslow, R. L. (2004). Classifying gene expression profiles from pairwise mRNA comparisons. Statistical Applications in Genetics and Molecular Biology, 3, Article 19.
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological), 58(1), 267–288.
Saade, G. R., Boggess, K. A., Sullivan, S. A., Markenson, G. R., Iams, J. D., Coonrod, D. V., et al. (2016). Development and validation of a spontaneous preterm delivery predictor in asymptomatic women. American Journal of Obstetrics and Gynecology, 214(5), 1–633.
Ma, S., & Huang, J. (2005). Regularized ROC method for disease classification and biomarker selection with microarray data. Bioinformatics, 21(24), 4356–4362.
Swets, J. A. (1979). ROC analysis applied to the evaluation of medical imaging techniques. Investigative Radiology, 14(2), 109–121.
Hanley, J. A., & McNeil, B. J. (1982). The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology, 143(1), 29–36.
Mason, S. J., & Graham, N. E. (2002). Areas beneath the relative operating characteristics (ROC) and relative operating levels (rol) curves: Statistical significance and interpretation. Quarterly Journal of the Royal Meteorological Society, 128(584), 2145–2166.
Hossain, A., & Khan, H. (2004). Nonparametric bootstrapping for multiple logistic regression model using R. BRAC University Journal, I(2), 109–113.
Zellner, D., Keller, F., & Zellner, G. E. (2004). Variable selection in logistic regression models. Communications in Statistics-Simulation and Computation, 33(3), 787–805.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Rodrigo, L.M., Polpitiya, A.D. Improving the classification accuracy using biomarkers selected from machine learning methods. Control Theory Technol. 19, 538–543 (2021). https://doi.org/10.1007/s11768-021-00071-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11768-021-00071-x