Revision with unchanged content. In many predictive modeling tasks, one has a fixed set of observations from which a vast, or even infinite, set of potentially predictive features can be computed. Of these features, often only a small number are expected to be useful in a predictive model. Models which use the entire set of features will almost certainly overfit on future data sets. The book presents streamwise feature selection which interleaves the process of generating new features with that of feature testing. Streamwise feature selection scales well to large feature sets. The book also describes how to use streamwise feature seleciton in multivariate regressions. It includes a review of traditional feature selecitions in a general framework based on information theory, and compares these methods with streamwise feature selection on various real and synthetic data sets. This book is intended to be used by researchers in machine learning, data mining, and knowledge discovery.