![]() |
#1
|
|||
|
|||
![]()
Is it data snooping even looingk at training data set? Assume that test data set is completely unknown, Of course.
|
#2
|
|||
|
|||
![]() Quote:
If you look at the trainings data set you do some "learning" in your mind. Thus you decrease the number of hypothesis dramatically by choosing a hypothesis set that seems to fit to the trainings data. This means you cannot work with ![]() ![]() ![]() However if you have not looked at the test-data and keep this data safe until you find the final hypothesis g(x) you can verify your final hypothesis with the test-data. The result is ![]() ![]() Thus my answer is: Yes it is snooping, if you look at the trainings data. Thus you cannot calculate the generalization bound out of the VC-Dimension. But since you have not looked at the test-data you can instead calculate the Hoeffding-bound and the result is a valid estimate for the out-of-sample error. However keep in mind, that after this calculation your test-data is also compromised and you cannot simply repeat the procedure, if the result is not as expected. |
#3
|
|||
|
|||
![]()
Ah, thanks a ton!!!!! It sounds like there are two kinds of data snooping: i) looking at the training data and ii) looking at the test data. I guess looking at the training data can be commonly and inevitably happening if we are to use learning algorithms which require a training process, such as neural network, PLA, support vector machine, and so on.
|
#4
|
|||
|
|||
![]() Quote:
__________________
[URL="http://www.cevizcibaba.com.tr"]ceviz fidanı[/URL] - [URL="http://ozguneyhavalandirma.com"]denizli havalandırma [/URL] |
![]() |
Thread Tools | |
Display Modes | |
|
|