LFD Book Forum VC dimension independent of probability distribution
 User Name Remember Me? Password
 FAQ Calendar Mark Forums Read

 Thread Tools Display Modes
#1
01-30-2013, 12:36 PM
 Anne Paulson Senior Member Join Date: Jan 2013 Location: Silicon Valley Posts: 52
VC dimension independent of probability distribution

In Lecture 7, we learn, I think, that if we have a finite VC dimension, then whatever our error rate is on our chosen hypothesis g, that error rate will generalize to all of our input space X, subject to the bounds we know about. That is, with at least some probability that we can compute, the error rate on our training set will be close to the error rate on the whole input space.

And we further learn, I think, that this generalization is true independent of the probability distribution we used to choose our input set.

But now I'm confused. Are we assuming that we use the same probability distribution when computing the error rate on the whole input space? That is, we check the error on every single point, but the ones that were more likely to be in the training set get weighted more, so it's an expectation over all possible training sets picked using the probability distribution, rather than just an error rate over the entire input space with uniform distribution?

Otherwise it doesn't make sense to me. Seems like we could rig the training set to make our cockamamie hypothesis look good.
#2
01-30-2013, 01:09 PM
 yaser Caltech Join Date: Aug 2009 Location: Pasadena, California, USA Posts: 1,477
Re: VC dimension independent of probability distribution

Quote:
 Originally Posted by Anne Paulson But now I'm confused. Are we assuming that we use the same probability distribution when computing the error rate on the whole input space? That is, we check the error on every single point, but the ones that were more likely to be in the training set get weighted more, so it's an expectation over all possible training sets picked using the probability distribution, rather than just an error rate over the entire input space with uniform distribution? Otherwise it doesn't make sense to me. Seems like we could rig the training set to make our cockamamie hypothesis look good.
The source of randomization in the VC inequality is the choice of the training set . The assumption is that this set is generated according to some probability distribution on , independently from one data point in to the other. The same probability distribution is used to compute by averaging over the whole input space .

Now, the statement that the VC inequality is independent of the probability distribution means that it holds for any probability distribution. Any training set you pick, according to any probability distribution (or even a rigged training set for that matter) will have at most as many dichotomies as the value of the growth function, since the growth function is defined as a maximum. Since that value is what is needed for the proof of the VC inequality, the inequality will always hold (more loosely for certain probability distributions, but will nonetheless hold).

Having said that, we are still not allowed to "rig" the choice of the training set , not because the number of dichotomies will be a problem -- it won't be, but because the basic premise of Hoeffding, on which the VC inequality is built, is that the points in are picked independently according to some probability distribution. You can rig the probability distribution if you want, but you still have to pick your data points independently from it, and use the same probability distribution to compute .
__________________
Where everyone thinks alike, no one thinks very much
#3
01-30-2013, 03:01 PM
 Anne Paulson Senior Member Join Date: Jan 2013 Location: Silicon Valley Posts: 52
Re: VC dimension independent of probability distribution

"You can rig the probability distribution if you want, but you still have to pick your data points independently from it, and use the same probability distribution to compute E-out ."

Great, that's what I wanted to know. Thanks.

 Thread Tools Display Modes Linear Mode

 Posting Rules You may not post new threads You may not post replies You may not post attachments You may not edit your posts BB code is On Smilies are On [IMG] code is On HTML code is Off Forum Rules
 Forum Jump User Control Panel Private Messages Subscriptions Who's Online Search Forums Forums Home General     General Discussion of Machine Learning     Free Additional Material         Dynamic e-Chapters         Dynamic e-Appendices Course Discussions     Online LFD course         General comments on the course         Homework 1         Homework 2         Homework 3         Homework 4         Homework 5         Homework 6         Homework 7         Homework 8         The Final         Create New Homework Problems Book Feedback - Learning From Data     General comments on the book     Chapter 1 - The Learning Problem     Chapter 2 - Training versus Testing     Chapter 3 - The Linear Model     Chapter 4 - Overfitting     Chapter 5 - Three Learning Principles     e-Chapter 6 - Similarity Based Methods     e-Chapter 7 - Neural Networks     e-Chapter 8 - Support Vector Machines     e-Chapter 9 - Learning Aides     Appendix and Notation     e-Appendices

All times are GMT -7. The time now is 04:50 AM.

 Contact Us - LFD Book - Top