In general you should do feature selection and/or feature transformation to convert your set of many features into a set of a few useful features. Often, domain expertise is needed in this task, and Chapter 3 gives an example of constructing intensity and symmetry features for digit recognition, reducing the 256 "pixel features" to 2 useful features.

There are at least two good reasons to construct a few useful features. The first is that useful features usually simplify the problem and a linear as opposed to nonlinear separator may work. The second is that the hypothesis set will be simpler and so you will be able to generalize better to out-of-sample with fewer data.

Originally Posted by netweavercn View Post
I followed prof. Lin open course on coursera, which is great. does the book mentioned how to choose features, or I missed? (Give you have hundreds of features, you need them all? or you only need some of them)
