Journal of Machine Learning Research 3 (2003) 1157-1182
Submitted 11/02; Published 3/03
An Introduction to Variable and Feature Selection
ISABELLE @ CLOPINET. COM
955 Creston Road
Berkeley, CA 94708-1501, USA
ANDRE @ TUEBINGEN . MPG . DE
Empirical Inference for Machine Learning and Perception Department
Max Planck Institute for Biological Cybernetics
72076 T¨ubingen, Germany
Editor: Leslie Pack Kaelbling
Variable and feature selection have become the focus of much research in areas of application for
which datasets with tens or hundreds of thousands of variables are available. These areas include
text processing of internet documents, gene expression array analysis, and combinatorial chemistry.
The objective of variable selection is three-fold: improving the prediction performance of the predictors, providing faster and more cost-effective predictors, and providing a better understanding of
the underlying process that generated the data. The contributions of this special issue cover a wide
range of aspects of such problems: providing a better definition of the objective function, feature
construction, feature ranking, multivariate feature selection, efficient search methods, and feature
validity assessment methods.
Keywords: Variable selection, feature selection, space dimensionality reduction, pattern discovery, filters, wrappers, clustering, information theory, support vector machines, model selection,
statistical testing, bioinformatics, computational biology, gene expression, microarray, genomics,
proteomics, QSAR, text classification, information retrieval.
As of 1997, when a special issue on relevance including several papers on variable and feature
selection was published (Blum and Langley, 1997, Kohavi and John, 1997), few domains explored
used more than 40 features. The situation has changed considerably in the past few...