View Single Post
Old 08-05-2012, 03:44 AM
rainbow rainbow is offline
Join Date: Jul 2012
Posts: 41
Default Data snooping (test vs. train data)

Do I understand the issue of data snooping correctly, if it is only an issue related to the test data itself? For example, if the inspection of test data affects the learning in some way.
- The test data has been used for estimation.
- If the learning model is changed after evaluating the performance on the test data?

How does data snooping relates to the train data (if at all). "How much" can you look into this data. Is it a violation wrt. data snooping to look at the target variable y if you are interested in exploratory data analysis such as PCA, or if you want to create features. For example, if you want to create a non-linear feature by cutting a continous variables such as age into a discrete feature with y in respect?
Reply With Quote