Feature Selection Approaches In Antibody Display

Authors

  • Inese Polaka Riga Technical University

DOI:

https://doi.org/10.17770/etr2011vol2.998

Keywords:

antibody display, classification, data mining, feature selection, ranking

Abstract

Molecular diagnostics tools provide specific data that have high dimensionality due to many factors analyzed in one experiment and few records due to high costs of the experiments. This study addresses the problem of dimensionality in melanoma patient antibody display data by applying data mining feature selection techniques. The article describes feature selection ranking and subset selection approaches and analyzes the performance of various methods evaluating selected feature subsets using classification algorithms C4.5, Random Forest, SVM and Naïve Bayes, which have to differentiate between cancer patient data and healthy donor data. The feature selection methods include correlation-based, consistency based and wrapper subset selection algorithms as well as statistical, information evaluation, prediction potential of rules and SVM feature selection evaluation of single features for ranking purposes.

Downloads

Download data is not yet available.

Author Biography

  • Inese Polaka, Riga Technical University
    Institute of Information Technology

References

Sundaresh, S. et al. From protein microarrays to diagnostic antigen discovery. Bioinformatics 23-13, 2007, p. i508-i518.

Guyon, I., Weston, J., Barnhill, S., Vapnik, V. Gene selection for cancer classification using support vector machines. Machine Learning. 46, 2002, p. 389-422.

Langley, P. Selection of relevant features in machine learning. Proceedings of the AAAI Fall Symposium on Relevance. New Orleans, Louisiana, USA, November 4-6, 1994. New Orleans: AAAI Press, 1994, p. 140-144.

Hall, M. A. Correlation-based Feature Subset Selection for Machine Learning. Dissertation at University of Waikato (Hamilton, New Zealand), 1998. 198 p.

Kohavi, R., John, G. H. Wrappers for feature subset selection. Artificial Intelligence 1-2, 1997, p. 273-324.

Tan, C.P., Lim, K.S., Lai, W.K.. Multi-Dimensional Features Reduction of Consistency Subset Evaluator on Unsupervised Expectation Maximization Classifier for Imaging Surveillance Application. International Journal of Image Processing, 2-1, 2008, p. 18-26.

Liu, H., Setiono, R. A probabilistic approach to feature selection - a filter solution. Proceedings of the 13th International Conference on Machine Learning (ICML'96), Bari, Italy, July 3-6, 1996. San Mateo: Morgan Kaufmann Pub., 1996, p. 319-327.

Witten, I. H., Frank, E. Data Mining: Practical Machine Learning Tools and Techniques, Second Edition. Morgan Kaufmann series in data management systems. San Mateo: Morgan Kaufmann Pub., 2005. 560 p.

Quinlan J. R. C4.5: Programs for Machine Learning. San Mateo: Morgan Kaufmann Pub., 1993. 302 p.

Holte, R. C. Very simple classification rules perform well on most commonly used datasets. Machine Learning 11-1, 1993, p. 63–90.

Kira, K., Rendell, L. A. A Practical Approach to Feature Selection. Ninth International Workshop on Machine Learning, Aberdeen, Scotland, UK, July 1-3, 1992. San Mateo: Morgan Kaufmann Pub., 1992, p. 249-256.

Kononenko, I. Estimating attributes: analysis and extensions of RELIEF. Proceedings of the European conference on machine learning on Machine Learning (ECML-94), Catania, Italy, April 6-8, 1994. Secaucus: Springer-Verlag New York, Inc., 1994, p. 171-182.

Robnik-Šikonja, M., Kononenko, I. Theoretical and Empirical Analysis of ReliefF and RReliefF. Machine Learning 53, 1-2, 2003, p. 23-69.

Dudoit, S., Fridlyand, J., Speed, T. P. Comparison of discrimination methods for the classification of tumors using gene expression data. Journal of the American Statistical Association 97-457, 2002, p. 77-87.

Lu, Y. Han, J. Cancer classification using gene expression data. Information Systems 28-4, 2003, p. 243-268.

Lee, J. W., Lee, J. B., Park, M., Song, S. H. An extensive comparison of recent classification tools applied to microarray data. Computational Statistics & Data Analysis 48-4, 2005, p. 869-885.

Poļaka I., Tom I., Borisovs A. Decision Tree Classifiers in Bioinformatics. Scientific Journal of RTU. 5. series, Computer Science, Information Technology and Management Science 44, 2010, p. 118-123.

Platt, J. Fast Training of Support Vector Machines using Sequential Minimal Optimization. In B. Schoelkopf and C. Burges and A. Smola (eds), Advances in Kernel Methods - Support Vector Learning. Cambridge, MA, USA: The MIT Press, 1998, 386 p.

Keerthi, S. S., Shevade, S. K., Bhattacharyya, C., Murthy, K. R. K. Improvements to Platt's SMO Algorithm for SVM Classifier Design. Neural Computation 13-3, 2001, p. 637-649.

Hastie, T., Tibshirani, R. Classification by Pairwise Coupling. Annals of Statistics 26-2, 1998, p. 451-471.

John, G. H., Langley, P. Estimating Continuous Distributions in Bayesian Classifiers. In: Eleventh Conference on Uncertainty in Artificial Intelligence, Montreal, Quebec, Canada, August 18-20, 1995. San Mateo: Morgan Kaufmann Pub., 1995, p. 338-345.

Breiman, L. Random Forests. Machine Learning 45-1, 2001, p. 5-32.

Golub, T. R. et al. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science (New York, N.Y.) 286(5439), 1999, p. 531–537.

Downloads

Published

2015-08-05

How to Cite

[1]
I. Polaka, “Feature Selection Approaches In Antibody Display”, ETR, vol. 2, pp. 16–23, Aug. 2015, doi: 10.17770/etr2011vol2.998.