View Single Post
Old 09-19-2012, 01:23 PM
htlin's Avatar
htlin htlin is offline
Join Date: Aug 2009
Location: Taipei, Taiwan
Posts: 601
Default Re: Cross validation and scaling?

Originally Posted by Andrs View Post
When using SVM/RBF provided by scikit-learn/LIBSVM, it is important that the data is scaled. My question is how should we scale (or standardize with zero mean and 1 variance) the data when using cross validation.
I have my training data D and I am dividing it based on k-fold cross validation. Here is a procedure:

1)first divide the data in "k-1 training folders" and "one test folder".
2)Perform a scaling operation on the test data(k-1 folders). It could be standardized(0,1)
3)Perform a scaling operation (based on the same parammeters) on the test folder. It could be standardized(0,1)
4)Train the classifier
6)Go to (1) until all folders are used as test folders.
I would like to check the following statement:
Should we have different scaling operations for cv_training/test data (first split the data, second scale each data set separetly). Otherwise there is a risk for snooping and too optimistic E_cv. I think the Professor mentioned a subtile snooping case due to scaling both training and test data!
The other alternative is to scale the whole data set D and then perform cross validation---> snooping.
It is a tricky question, and the bottom line is: Is scaling considered part of the learning procedure, or just "pre-processing"?

If scaling is pre-processing, scaling the whole training set is legitimate, not snooping. The E_{cv} you get would refelect an estimate of the test performance in the special, pre-processed space.

On the other hand, if scaling is part of learning, scaling should be done on the sub-training part instead. The E_{cv} will then be the estimated performance of (scale, train and then test).

There is no right or wrong for the two choices --- just different viewpoints. In my experience, the performance difference between the two choices (on a locked test set) is often rather negligible in practice, and hence we often see people consider scaling as "pre-processing" for its simplicity of implementation.

Hope this helps.
When one teaches, two learn.
Reply With Quote