LFD Book Forum (http://book.caltech.edu/bookforum/index.php)
-   Appendix and Notation (http://book.caltech.edu/bookforum/forumdisplay.php?f=142)
-   -   Discussion of the VC proof (http://book.caltech.edu/bookforum/showthread.php?t=4524)

 fgpancorbo 09-07-2012 11:03 AM

Discussion of the VC proof

Quote:
 Originally Posted by admin (Post 11768) For the courageous souls :) who read the VC proof in the Appendix carefully, please use this subforum to post any questions or comments. In particular, we will appreciate any comments that help us and other instructors decide whether to include the formal proof in our classroom presentations, or settle for the sketch of the proof that is given in Chapter 2.
After I am done with the final, I'll go through the proof in detail and will let you know what I think. As the course comes to an end, I have nothing but gratitude to you, the Caltech staff and Caltech for making this class available to everybody for free. In addition, it's obvious why you got all those awards for excellence in teaching. I wholeheartedly agree with your approach to teaching the class, especially the selection of the content.

 yaser 09-11-2012 08:08 PM

Re: The VC Proof

Quote:
 Originally Posted by fgpancorbo (Post 5016) After I am done with the final, I'll go through the proof in detail and will let you know what I think. As the course comes to an end, I have nothing but gratitude to you, the Caltech staff and Caltech for making this class available to everybody for free. In addition, it's obvious why you got all those awards for excellence in teaching. I wholeheartedly agree with your approach to teaching the class, especially the selection of the content.
Thank you kindly.

 gah44 09-18-2012 06:21 PM

Re: The VC Proof

I tend to be closer to an "experimental" scientist. Most often, I don't want to read the proof, though it is nice to know that it is there.

The level covered in the course was just about right for me. I believe I followed it well enough to do the problems, though.

Thanks!

 eakarahan 11-18-2012 03:29 AM

Re: The VC Proof

Quote:
 Originally Posted by admin (Post 11768) For the courageous souls :) who read the VC proof in the Appendix carefully, please use this subforum to post any questions or comments. In particular, we will appreciate any comments that help us and other instructors decide whether to include the formal proof in our classroom presentations, or settle for the sketch of the proof that is given in Chapter 2.
First comment about the proof in Appendix: I experienced a notation problem with P(x, y). This has two meanings. Joint probability or probability density. I would prefer another notation for density, since using one notation with two meanings is ambiguous.

For your question I would like to say: Due to the mixed mathematical background of any classroom, include the formal proof in your classroom presentations. Further, I suggest that you extract the proof from the appendix and include it in the text in Chapter 2, since I think it would be more natural that you develop VC-bound within the lecture formally.

I never like God send formulas, I like to understand the natural development of these ideas and learn how mathematicians transform these ideas into the formulas without any break. A deferred proof, break this sequence force us to remember what was referring what.

If you think the opposite, than enhance your Appendix so that the derivations are given step by step with helping explanations beyond the formulas. Current format of the proof requires the student create this sequence by her/himself. I want to know what is the natural historical development sequence of these ideas. Maybe I should refer the VC-book, Statistical Learning Theory. But, I need the help of an instructor like you, since reading a 750 pages book is infeasible; especially if you yearn for writing your first ML applications within the next 3 months.

For human being english is always better then notations, but notations are inevitable to avoid ambiguities; thus a combination of both is the ideal format. I see you try to achieve this ideal. This is good, so.

At the moment that I wrote this message, I've watched the 6. video and read the book until Chp 2, section 2.3 and I read the two pages of the proof in appendix. Until now, two thinks broke me disturbing my mind; how the Hoeffding found his from God send inequality, and how can I understand the long derivations of VC-Bound including the mathematical technicalities you mention in the book and videos. I'm still working on the second one. And try to accept Hoeffding's inequality as given (which disturbs my mind like a bug in my brain and decelerates my learning).

I know you want to show us the forest instead of dealing with the leaves of trees, but you can do this like you have done in the book between th pages 46-49 using "safe skip" blocks.

A last word, this is the first ML course that I could see the whole picture, the forest. As such this is the first successful attempt that I experienced.

Thank you all, for your effort.

 fgpancorbo 01-01-2013 09:26 PM

Re: The VC Proof

Well, although with a bit of delay :(, I did go over the proof. Sorry that it took this long but after the class ended over the summer, I became increasingly busy and I seemed to never find the time to read the proof.

I didn't understand each and every detail but I did understand the overall proof. I think that the level of rigor is about right, given that it's an optional section destined to those who are more mathematically inclined, so I wouldn't change that. Since the most ingenious trick is the idea of approximating with for a second data set, my only suggestion is to make the connection that that's exactly what we do in the homeworks, where we put that intuition to work. We estimated out of a second set of randomly generated samples, which, if I understood well, is exactly what the trick is (then we averaged out over several runs which is also something that the proof does). The rest of the proof involves the introduction of several technical steps to reach to the final result, but I think that a correct understanding of Lemma A.2 is the key to understanding the overall proof. Since by the end of the course, the student has put the trick to work many times, it relates beautifully a theoretical derivation with the practical work done in the homeworks.

PS1: Of course, I wish a Happy and Prosperous New Year to everyone!
PS2: I was in LA during Christmas and I paid a visit to the Caltech campus. I would have loved to say hello, but since the campus was pretty much empty I assumed that there would be nobody to say hello to :D.

 yaser 01-03-2013 09:09 AM

Re: The VC Proof

Quote:
 Originally Posted by fgpancorbo (Post 8414) I was in LA during Christmas and I paid a visit to the Caltech campus. I would have loved to say hello, but since the campus was pretty much empty I assumed that there would be nobody to say hello to :D.
Thank you for visiting. Indeed, the campus was closed between Christmas and New Year's, and I was traveling at the time.

 fgpancorbo 01-03-2013 08:44 PM

Re: The VC Proof

Quote:
 Originally Posted by yaser (Post 8424) Thank you for visiting. Indeed, the campus was closed between Christmas and New Year's, and I was traveling at the time.
Since I couldn't visit JPL this time, I will probably come back to Pasadena at some point, so maybe next time :D. Anyhow, it was great to put my feet on campus since studying at Caltech (instead of Stanford) is what I had dreamed of one day doing when I watched Carl Sagan's documentaries as a teen. I was able to take a very good Caltech class virtually, thanks to you. And you are such a terrific teacher that this was a great experience.

 Averroes 08-09-2013 12:27 PM

Re: The VC Proof

I am a machine learning practitioner currently applying machine learning to algorithmic trading, yet highly interested in the theoretical grounds of the field.

I have read your book "Learning from data" from cover to cover. I haven't solved the problems though. I however did go through the proof of the VC bound in the appendix. I succeeded to understand most of it except (A.4) in the bottom of page 189. I can understand that you have applied Hoeffding Inequality to h*, but your explanation on how this applies to h* conditioned to the sup_H event, is hard to grasp for me.

Can you please give more explanation on how using Hoeffding (A.4) holds ? Or give a reference that helps clarifying this result?

Thanks.

 yaser 08-11-2013 06:02 PM

Re: The VC Proof

Quote:
 Originally Posted by Averroes (Post 11313) I succeeded to understand most of it except (A.4) in the bottom of page 189. I can understand that you have applied Hoeffding Inequality to h*, but your explanation on how this applies to h* conditioned to the sup_H event, is hard to grasp for me.
Thank you for your post. The hypothesis is based on a fixed data set among the possible s. Once we know that such exists, the conditioned-upon statement does not matter. We have a fixed hypothesis and independently generated data sets , so we can apply Hoeffding to that in isolation.

 jokovc 11-03-2014 09:23 PM

Re: Discussion of the VC proof

Hi Prof. Yaser. I have a problem with the proof of Lemma A.2., page 190. I don't understand what this part means:
Quote:
 Finally, since h* depends on a particular D . . . , it holds for the weighted average
Thank you.

 yaser 11-03-2014 11:18 PM

Re: Discussion of the VC proof

Quote:
 Originally Posted by jokovc (Post 11806) Hi Prof. Yaser. I have a problem with the proof of Lemma A.2., page 190. I don't understand what this part means
Here is what it means. Let's say that you have for every . Then, regardless of what the probability distribution of , it will be true that since we can multiply both sides of by and integrate out.

 jokovc 11-04-2014 08:24 PM

Re: Discussion of the VC proof

Quote:
 Originally Posted by yaser (Post 11810) Here is what it means. Let's say that you have for every . Then, regardless of what the probability distribution of , it will be true that since we can multiply both sides of by and integrate out.
I think this is so much clearer than when it is put in sentences. Thank you.

 ilson 10-01-2015 07:32 PM

Re: Discussion of the VC proof

I was going through the proof in Appendix A and I just want to make sure that something written towards the bottom of pg. 189 is a typo.

Namely,

Quote:
 Inequality (A.3) folows because the events "" and "" (which is given) imply "".

Quote:
 Inequality (A.3) folows because the events "" and "" (which is given) imply "".
I'm 99.999999999999% sure this is indeed a typo since the latter case easily follows from reverse triangle inequality and it suffices to show the inequality in (A.3) and I cannot see how one can arrive at the implication in the former case nor how the former case implies the inequality (A.3), but it would ease my mind if I can get a verification that it is a typo. Thank you in advance!

 magdon 10-13-2015 12:04 PM

Re: Discussion of the VC proof

Yes, this is a typo. Thank you for pointing it out. You have it correct.

If A and C imply B, then

Quote:
 Originally Posted by ilson (Post 12078) I was going through the proof in Appendix A and I just want to make sure that something written towards the bottom of pg. 189 is a typo. Namely, should read I'm 99.999999999999% sure this is indeed a typo since the latter case easily follows from reverse triangle inequality and it suffices to show the inequality in (A.3) and I cannot see how one can arrive at the implication in the former case nor how the former case implies the inequality (A.3), but it would ease my mind if I can get a verification that it is a typo. Thank you in advance!

 CharlesL 09-26-2016 09:56 PM

Re: Discussion of the VC proof

Recently I have started reading the proof of the VC inequality in the appendix. On the bottom of page 190(Lemma A.3) , why does

sigma_S　P[S] x P[sup_h E_in - E_in' > ... |S ] <= sup_S P[sup_h E_in - E_in' >．．．|S ]?
(Sorry for the terrible notations, I don't know how I can input math symbols)

what does it mean by taking the supremum on S?

 htlin 10-05-2016 02:23 PM

Re: Discussion of the VC proof

Quote:
 Originally Posted by CharlesL (Post 12439) Recently I have started reading the proof of the VC inequality in the appendix. On the bottom of page 190(Lemma A.3) , why does sigma_S　P[S] x P[sup_h E_in - E_in' > ... |S ] <= sup_S P[sup_h E_in - E_in' >．．．|S ]? (Sorry for the terrible notations, I don't know how I can input math symbols) what does it mean by taking the supremum on S?
All complicated math aside, supremum on S carries the physical meaning of taking the "maximum" value over all possible S.

So this inequality simply says an expected value (of P[sup_...]) is less than or equal to the maximum value.

Hope this helps.

 tayfun29 10-08-2016 03:44 PM

Re: Discussion of the VC proof

Quote:
 Originally Posted by htlin (Post 12450) All complicated math aside, supremum on S carries the physical meaning of taking the "maximum" value over all possible S. So this inequality simply says an expected value (of P[sup_...]) is less than or equal to the maximum value. Hope this helps.
Thank...

 gamelover623 10-25-2016 07:39 AM

Re: Discussion of the VC proof

I am a machine learning practitioner currently applying machine learning to algorithmic trading, yet highly interested in the theoretical grounds of the field.

I have read your book "Learning from data" from cover to cover. I haven't solved the problems though. I however did go through the proof of the VC bound in the appendix. I succeeded to understand most of it except (A.4) in the bottom of page 189. I can understand that you have applied Hoeffding Inequality to h*, but your explanation on how this applies to h* conditioned to the sup_H event, is hard to grasp for me.

Can you please give more explanation on how using Hoeffding (A.4) holds ? Or give a reference that helps clarifying this result?

 CountVonCount 10-26-2016 03:23 AM

Re: Discussion of the VC proof

Hi,

I have a question about the sentence on page 190:
Quote:
 Note that we can assume e^(-0.5*N*eps^2) < 1/4, because otherwise the bound in Theorem A.1 is trivially true.
While I understand the argument here, I don't understand, why it is especially the value 1/4?
When set the above term to 1/4 I will receive -2*ln(1/4) as value for N*eps^2.
Now I can set N*eps^2 to that value in Theorem A.1 and I will get on the RHS (assuming the growth function is just 1) 4*0,707... so it is much more than 1.

A value of 1 in the RHS would be sufficient to say the bound in Theorem A.1 is trivially true. And this would assume, that the above term is less than 1/256.
With this in mind 1 - 2*e^(-0.5*N*eps^2) is greater than 0,99... and thus instead of a 2 in the lemmas outcome, I would receive a value around 1, which is a much better outcome.

So why is the value 1/4 chosen for the assumption?

Best regards,
André

 magdon 10-26-2016 01:38 PM

Re: Discussion of the VC proof

Suppose

Then, .

In which case and the bound in Theorem A.1 is trivial.

Quote:
 Originally Posted by CountVonCount (Post 12469) Hi, I have a question about the sentence on page 190: While I understand the argument here, I don't understand, why it is especially the value 1/4? When set the above term to 1/4 I will receive -2*ln(1/4) as value for N*eps^2. Now I can set N*eps^2 to that value in Theorem A.1 and I will get on the RHS (assuming the growth function is just 1) 4*0,707... so it is much more than 1. A value of 1 in the RHS would be sufficient to say the bound in Theorem A.1 is trivially true. And this would assume, that the above term is less than 1/256. With this in mind 1 - 2*e^(-0.5*N*eps^2) is greater than 0,99... and thus instead of a 2 in the lemmas outcome, I would receive a value around 1, which is a much better outcome. So why is the value 1/4 chosen for the assumption? Best regards, André

 CountVonCount 10-26-2016 02:28 PM

Re: Discussion of the VC proof

I have also another question on the same page (190):
At the end of the page there is the formula:

https://latex.codecogs.com/gif.latex...pace;S&space;]

I don't understand why the RHS is greater or equal to the LHS. The only legitimation I see for this is, that the distribution of P[S] is uniform, but this has not been stated in the text.
Or do I oversee here anything and this is also valid for all kinds of distribution?

 CountVonCount 10-26-2016 02:32 PM

Re: Discussion of the VC proof

Quote:
 Originally Posted by magdon (Post 12472) Suppose Then, . In which case and the bound in Theorem A.1 is trivial.
Hello,

thanks for the answer. I understand this argument, however this holds also for

or for

Thus the value 1/4 is somehow magic for me.

Edit: I think you choose 1/4 because it is so easy to see, that the RHS of Theorem A.1 gets 1. Nevertheless with a different value you would get a different outcome of the final formula.

 CountVonCount 10-28-2016 12:52 AM

Re: Discussion of the VC proof

Quote:
 Originally Posted by CountVonCount (Post 12473) ... I don't understand why the RHS is greater or equal to the LHS. The only legitimation I see for this is, that the distribution of P[S] is uniform, but this has not been stated in the text. Or do I oversee here anything and this is also valid for all kinds of distribution?
I got it by myself.
When we have a uniform distribution of P[S] the outcome of the product-sum

is simply the average of all P[A|S], since

And an average is of course less than or equal to the maximum of P[A|S].
If P[S] is not distributed uniformly we still have an average, but a weighted average. But also here the result is always less than or equal to the maximum. Because you cannot find weighting factors that are in sum 1 but will lead to a higher result as the maximum P[A|S].

 magdon 11-09-2016 05:21 AM

Re: Discussion of the VC proof

Correct.

Quote:
 Originally Posted by CountVonCount (Post 12475) I got it by myself. When we have a uniform distribution of P[S] the outcome of the product-sum is simply the average of all P[A|S], since And an average is of course less than or equal to the maximum of P[A|S]. If P[S] is not distributed uniformly we still have an average, but a weighted average. But also here the result is always less than or equal to the maximum. Because you cannot find weighting factors that are in sum 1 but will lead to a higher result as the maximum P[A|S].

 magdon 11-09-2016 05:23 AM

Re: Discussion of the VC proof

You are correct. We could have made other assumptions.

But there is a special reason why we DO want 1. Because we are bounding the probability, and so there is nothing to prove if we claim that a probability is less equal to 1. So, whenever the RHS (i.e. the bound) evaluates to 1 or bigger, there is nothing to prove. So we only need to consider the case when the bound evaluates to less than 1.

Quote:
 Originally Posted by CountVonCount (Post 12474) Hello, thanks for the answer. I understand this argument, however this holds also for or for Thus the value 1/4 is somehow magic for me. Edit: I think you choose 1/4 because it is so easy to see, that the RHS of Theorem A.1 gets 1. Nevertheless with a different value you would get a different outcome of the final formula.

 CountVonCount 11-11-2016 02:52 AM

Re: Discussion of the VC proof

Thank you very much for your answers. This helps me a lot to understand the intention of the single steps of this prove.

 sarafox 08-28-2017 09:13 AM

Re: Discussion of the VC proof

i had same question but thanks to your website my question just solved. thanks

 samyar1 04-23-2018 08:00 AM

Re: Discussion of the VC proof