
#1




data snooping
Is it data snooping even looingk at training data set? Assume that test data set is completely unknown, Of course.

#2




Re: data snooping
Quote:
If you look at the trainings data set you do some "learning" in your mind. Thus you decrease the number of hypothesis dramatically by choosing a hypothesis set that seems to fit to the trainings data. This means you cannot work with from the reduced hypothesis set to calculate the generalization bound. Instead you need to use a higher , but it is unclear which to use, since you don't know exactly the of the full hypothesis set in your mind before looking at the data. However if you have not looked at the testdata and keep this data safe until you find the final hypothesis g(x) you can verify your final hypothesis with the testdata. The result is and with the Hoeffdingbound you can estimate your completely independent of the VCDimension value. Thus my answer is: Yes it is snooping, if you look at the trainings data. Thus you cannot calculate the generalization bound out of the VCDimension. But since you have not looked at the testdata you can instead calculate the Hoeffdingbound and the result is a valid estimate for the outofsample error. However keep in mind, that after this calculation your testdata is also compromised and you cannot simply repeat the procedure, if the result is not as expected. 
#3




Re: data snooping
Ah, thanks a ton!!!!! It sounds like there are two kinds of data snooping: i) looking at the training data and ii) looking at the test data. I guess looking at the training data can be commonly and inevitably happening if we are to use learning algorithms which require a training process, such as neural network, PLA, support vector machine, and so on.

#4




Re: data snooping
Quote:
__________________
[URL="http://www.cevizcibaba.com.tr"]ceviz fidanı[/URL]  [URL="http://ozguneyhavalandirma.com"]denizli havalandırma [/URL] 
Thread Tools  
Display Modes  

