LFD Book Forum  

Go Back   LFD Book Forum > General > General Discussion of Machine Learning

 
 
Thread Tools Display Modes
Prev Previous Post   Next Post Next
  #1  
Old 09-19-2012, 06:30 AM
Andrs Andrs is offline
Member
 
Join Date: Jul 2012
Posts: 47
Default Cross validation and scaling?

When using SVM/RBF provided by scikit-learn/LIBSVM, it is important that the data is scaled. My question is how should we scale (or standardize with zero mean and 1 variance) the data when using cross validation.
I have my training data D and I am dividing it based on k-fold cross validation. Here is a procedure:

1)first divide the data in "k-1 training folders" and "one test folder".
2)Perform a scaling operation on the test data(k-1 folders). It could be standardized(0,1)
3)Perform a scaling operation (based on the same parammeters) on the test folder. It could be standardized(0,1)
4)Train the classifier
5)CV-test
6)Go to (1) until all folders are used as test folders.
I would like to check the following statement:
Should we have different scaling operations for cv_training/test data (first split the data, second scale each data set separetly). Otherwise there is a risk for snooping and too optimistic E_cv. I think the Professor mentioned a subtile snooping case due to scaling both training and test data!
The other alternative is to scale the whole data set D and then perform cross validation---> snooping.
Reply With Quote
 

Tags
cross-validation, snooping

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT -7. The time now is 10:30 PM.


Powered by vBulletin® Version 3.8.3
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.
The contents of this forum are to be used ONLY by readers of the Learning From Data book by Yaser S. Abu-Mostafa, Malik Magdon-Ismail, and Hsuan-Tien Lin, and participants in the Learning From Data MOOC by Yaser S. Abu-Mostafa. No part of these contents is to be communicated or made accessible to ANY other person or entity.