 LFD Book Forum HW 4 question3 #11
 rohanag Invited Guest Join Date: Apr 2012 Posts: 94 Re: HW 4 question3

thanks elkka , i don't know that I was thinking #12
 silvrous Member Join Date: Apr 2012 Posts: 24 Re: HW 4 question3

Quote:
 Originally Posted by elkka I first thought the same thing about 1. But then, where do we see ? It is the measure of difference between E_in and E_out, which can be small, and can be big depending on the experiment. Suppose you are talking about an experiment with very large numbers, like the number of minutes people use in a month on a cell phone (which, say, average 200). Than it is totally meaningful to consider a prediction that assures you that (or 5, or 10) with probability 0.95. So, it totally makes sense to rate the bounds even if they all are >1
Except that Ein and Eout are ratios. I quote from HW2: "Eout (number of out-of-sample points misclassiﬁed / total number of outof-sample points)".
Therefore, it is quite impossible for epsilon to ever exceed 1.
#13
 IamMrBB Invited Guest Join Date: Apr 2012 Posts: 107 Re: HW 4 question3

Quote:
 Originally Posted by elkka I first thought the same thing about 1. But then, where do we see ? It is the measure of difference between E_in and E_out, which can be small, and can be big depending on the experiment. Suppose you are talking about an experiment with very large numbers, like the number of minutes people use in a month on a cell phone (which, say, average 200). Than it is totally meaningful to consider a prediction that assures you that (or 5, or 10) with probability 0.95. So, it totally makes sense to rate the bounds even if they all are >1
I don't think you are right on this. E_in and E_out in the Vapnik-Chervonenkis Inequality (lecture 6), which is the basis for the VC bound, are fractions and not absolute numbers. I know elsewhere in the course the professor has used E_out also for numbers which can be bigger than 1 (e.g. squared error, lecture 8), however when you lookup the Vapnik-Chervonenkis Inequality, you'll see that E_in and E_out are probabilities/probility measures (i.e. fraction incorrectly classified).

To see that your example probably doesn't make sense (IMHO): replace the minutes in your example with either nanoseconds or, on the other hand, ages, and you would get very different numbers on the left side of the equation (i.e. epsilon) while it wouldn't make a difference for the right side of the equation. This can't be right (it would e.g. be unlikely that E_in and E_out are 60 seconds apart but likely that they are a minute apart?!): it would make the inequalities meaningless.

Also on the slides of lecture 6, it is fractions (in)correctly classified that are used for the Vapnik-Chervonenkis Inequality.

Dislaimer: I'm not an expert on the matter, and perhaps I miss a/the point somewhere, so hope we'll get a verdict by the course staff.
#14
 elkka Invited Guest Join Date: Apr 2012 Posts: 57 Re: HW 4 question3

You know, I think you are right. We are indeed only talking about classification problem, so E_in and E_out must be <= 1.
#15
 kkkkk Invited Guest Join Date: Mar 2012 Posts: 71 Re: HW 4 question3

Here is my view which can be wrong. Refer to lecture 4, slides 7 onwards.

Ein and Eout are the average of the error measure per point. And it is up to the user to choose the error measure. So Ein and Eout are just numbers and not probabilities. And so epsilon which is the difference between the two, is also a number.

Also see lecture 8, slides 15 and 20: Eout = bias + variance = 0.21 + 1.69 > 1
#16
 mikesakiandcp Member Join Date: Apr 2012 Posts: 31 Re: HW 4 question3

Quote:
 Originally Posted by silvrous I also got values larger than 1 for all of them, and therefore considered them to be equally meaningless for small N...
I also assumed this, since it is a classification problem. Since they are bounds and all greater than one, we cannot infer anything about epsilon for all of them in this range of N, thus they should all be equivalent.
#17
 silvrous Member Join Date: Apr 2012 Posts: 24 Re: HW 4 question3

Could someone from the course staff perhaps weigh in on this? There seem to be two equally valid theories....
#18 yaser Caltech Join Date: Aug 2009 Location: Pasadena, California, USA Posts: 1,477 Re: HW 4 question3

Quote:
 Originally Posted by silvrous Could someone from the course staff perhaps weigh in on this? There seem to be two equally valid theories....
If it is a probability then indeed bounds greater than 1 are trivial, but the question just asked about the quality of the bounds for what it's worth. In general, the behavior in practice is proportinal to the VC bound, so the actual value (as opposed to relative value) is not as critical.
__________________
Where everyone thinks alike, no one thinks very much
#19
 data_user Junior Member Join Date: Jul 2012 Posts: 6 Re: HW 4 question3

It suggested to use the simple approximate bound N^d_vc for the growth function, if N > d_vc. In Problem 3, N=5<d_vc=50. Should we still use N^d_vc as an approximation for the growth function? Or, maybe it is more reasonable to use 2^N, assuming that H is complex enough?
#20 yaser Caltech Join Date: Aug 2009 Location: Pasadena, California, USA Posts: 1,477 Re: HW 4 question3

Quote:
 Originally Posted by data_user It suggested to use the simple approximate bound N^d_vc for the growth function, if N > d_vc. In Problem 3, N=5
Indeed, if , then the growth function is exactly . The fact that is complex enough is already implied by the value of the VC dimension.
__________________
Where everyone thinks alike, no one thinks very much Thread Tools Show Printable Version Email this Page Display Modes Linear Mode Switch to Hybrid Mode Switch to Threaded Mode Posting Rules You may not post new threads You may not post replies You may not post attachments You may not edit your posts BB code is On Smilies are On [IMG] code is On HTML code is Off Forum Rules
 Forum Jump User Control Panel Private Messages Subscriptions Who's Online Search Forums Forums Home General     General Discussion of Machine Learning     Free Additional Material         Dynamic e-Chapters         Dynamic e-Appendices Course Discussions     Online LFD course         General comments on the course         Homework 1         Homework 2         Homework 3         Homework 4         Homework 5         Homework 6         Homework 7         Homework 8         The Final         Create New Homework Problems Book Feedback - Learning From Data     General comments on the book     Chapter 1 - The Learning Problem     Chapter 2 - Training versus Testing     Chapter 3 - The Linear Model     Chapter 4 - Overfitting     Chapter 5 - Three Learning Principles     e-Chapter 6 - Similarity Based Methods     e-Chapter 7 - Neural Networks     e-Chapter 8 - Support Vector Machines     e-Chapter 9 - Learning Aides     Appendix and Notation     e-Appendices

All times are GMT -7. The time now is 11:43 PM. The contents of this forum are to be used ONLY by readers of the Learning From Data book by Yaser S. Abu-Mostafa, Malik Magdon-Ismail, and Hsuan-Tien Lin, and participants in the Learning From Data MOOC by Yaser S. Abu-Mostafa. No part of these contents is to be communicated or made accessible to ANY other person or entity.