View Single Post
  #3  
Old 06-05-2012, 05:49 AM
dudefromdayton dudefromdayton is offline
Invited Guest
 
Join Date: Apr 2012
Posts: 140
Default Re: Cross validation and data snooping

Professor Magdon-Ismail's answer is great in the most general case, but I think you'll also want to look at what you want to do with machine learning.

If you're doing binary classification, and you define your data model with normalized outputs, such as +1 for peach marbles and -1 for plum marbles, and you encode your y's in this fashion, you haven't snooped.

And then if you normalize your x's for the same problem, but you only use these inputs for scaling and don't touch the y's, you still haven't snooped.

Where I see potential danger for scaling the input -- and it's not related to snooping, is that if you don't scale isotropically in the input components, you may change the relative sensitivity among these components. If you're going into an RBF kernel after that, the kernel's response to these separate components will be changed. So I'd add that caution, although I don't think it's snooping in a conventional sense. But it can be a data-dependent effect, so again, I'd be alert to it. I haven't noticed any mention of this possibility in any of our RBF lectures, notes, etc.
Reply With Quote