Thread: data snooping
View Single Post
Old 12-26-2016, 11:29 PM
hidir hidir is offline
Junior Member
Join Date: Dec 2016
Location: orlando
Posts: 1
Default Re: data snooping

Originally Posted by CountVonCount View Post
That is an interesting question and I don't know the exact answer. But I try to give an answer that fits to my understanding.

If you look at the trainings data set you do some "learning" in your mind. Thus you decrease the number of hypothesis dramatically by choosing a hypothesis set that seems to fit to the trainings data.
This means you cannot work with d_{VC} from the reduced hypothesis set to calculate the generalization bound. Instead you need to use a higher d_{VC}, but it is unclear which to use, since you don't know exactly the d_{VC} of the full hypothesis set in your mind before looking at the data.

However if you have not looked at the test-data and keep this data safe until you find the final hypothesis g(x) you can verify your final hypothesis with the test-data. The result is E_{test} and with the Hoeffding-bound you can estimate your E_{out} completely independent of the VC-Dimension value.

Thus my answer is: Yes it is snooping, if you look at the trainings data. Thus you cannot calculate the generalization bound out of the VC-Dimension. But since you have not looked at the test-data you can instead calculate the Hoeffding-bound and the result is a valid estimate for the out-of-sample error.
However keep in mind, that after this calculation your test-data is also compromised and you cannot simply repeat the procedure, if the result is not as expected.
[URL=""]ceviz fidanı[/URL] - [URL=""]denizli havalandırma [/URL]
Reply With Quote