LFD Book Forum

LFD Book Forum (http://book.caltech.edu/bookforum/index.php)
-   Homework 4 (http://book.caltech.edu/bookforum/forumdisplay.php?f=133)
-   -   HW 4 question3 (http://book.caltech.edu/bookforum/showthread.php?t=421)

markweitzman 05-01-2012 07:03 PM

HW 4 question3
 
I am confused about question3 - are they not all above 1 and therefore essentially equivalent in the range 2-8. Or did I do the calculation incorrectly?

Mark Weitzman

Hillbilly 05-01-2012 07:18 PM

Re: HW 4 question3
 
I agree with the implication of your question -- isn't an epsilon greater than 1 essentially meaningless, making them all equivalent in that sense? Nevertheless, I got distinctly different curves for the four choices, strictly ordered, so I answered on that basis, and apparently that was the right perspective. Devroye gave me weird results on both extremes of N; N=2 particularly bizarre, but I may have a bug in it I haven't found yet.

markweitzman 05-01-2012 07:32 PM

Re: HW 4 question3
 
Well I thought like you did and calculated similar results, when I realized that all equivalent if all greater than 1 seems like the best result.

Mark Weitzman

silvrous 05-01-2012 09:45 PM

Re: HW 4 question3
 
I also got values larger than 1 for all of them, and therefore considered them to be equally meaningless for small N...

rohanag 05-02-2012 12:52 AM

Re: HW 4 question3
 
how are the recursive questions (part c and d) to be plotted?

IamMrBB 05-02-2012 02:07 AM

Re: HW 4 question3
 
I have the same question/remark as silvrous and markweitzman. Since epsilon bounds the absolute difference of two probabilities/probability measures/frequencies (at least that is what I understood from the class and a quick google lookup) a statement of epsilon < 3 (for example) is equivalent to the stamement epsilon <= 1. Since all bounds gave numbers in the ball park 3, I reasoned they are all equivalent to bounds epsilon <= 1, i.e. with this small number of examples we cannot say anything about Eout, at least not with a delta of 0.05 per the question.

I have to admit that I thougth long and hard about the what was the intention of the question: just to test if we can calculate these scary looking formulas, or to test our understanding of learning (in particular understanding that you need a minimum amount of data before you can make strong (delta = 5%) statements about the out of sample). Since the calculation aspect was already tested in q2, I hoped and guessed that q3 was aiming at the other aspect.

In the end I therefore went for answer e ("they are all equivalent"), which I thought was the most correct, although there was indeed a chance the question was intended differently.

Professor, or any other expert on the subject, am I correct in my assumption about that epsilon < 3 is equivalent to epsilon <= 1?

lucifirm 05-02-2012 02:36 AM

Re: HW 4 question3
 
What I did was to create a vector \epsilon of the same size of N, but varying from 0 to 1. I don't know if this is the best approach, but it helped me to plot the curves. The problem is I thought I had to chose only one correct answer, so I could not choose c or d, because for me they were both correct for large N.


I don't know if this will work, but here's a link to the figure.

https://docs.google.com/open?id=0By0...FRkclhvRnZMd2s

rohanag 05-02-2012 05:23 AM

Re: HW 4 question3
 
thanks lucifirm, do you mean, you tried different values of \epsilon and then compared the left hand side and right hand side of the equations?
you'r right both c and d are the answers for large value of N. But for small values of N, c is the correct answer according to he key.

elkka 05-02-2012 05:34 AM

Re: HW 4 question3
 
I first thought the same thing about 1. But then, where do we see \varepsilon? It is the measure of difference between E_in and E_out, which can be small, and can be big depending on the experiment. Suppose you are talking about an experiment with very large numbers, like the number of minutes people use in a month on a cell phone (which, say, average 200). Than it is totally meaningful to consider a prediction that assures you that |E_{in} -E_{out}|<2 (or 5, or 10) with probability 0.95. So, it totally makes sense to rate the bounds even if they all are >1

elkka 05-02-2012 05:38 AM

Re: HW 4 question3
 
rohanag, the recursive bounds can easily be solved for \varepsilon, as 1. they are, essentially, quadratic equations, and 2. only one root is of interest, as \varepsilon>0. By solving the equations you get

(c) \varepsilon = \frac{1}{N}+\sqrt{\frac{1}{N^2}+\frac{1}{N}\ln{\frac{6m_\mathcal{H}(2N)}{\delta}}};

(d) \varepsilon = \frac{1}{N-2}+\sqrt{\frac{1}{(N-2)^2}+\frac{1}{2(N-2)}\ln{\frac{4m_\mathcal{H}(N^2)}{\delta}}}.

rohanag 05-02-2012 05:49 AM

Re: HW 4 question3
 
thanks elkka , i don't know that I was thinking :(

silvrous 05-02-2012 06:01 AM

Re: HW 4 question3
 
Quote:

Originally Posted by elkka (Post 1753)
I first thought the same thing about 1. But then, where do we see \varepsilon? It is the measure of difference between E_in and E_out, which can be small, and can be big depending on the experiment. Suppose you are talking about an experiment with very large numbers, like the number of minutes people use in a month on a cell phone (which, say, average 200). Than it is totally meaningful to consider a prediction that assures you that |E_{in} -E_{out}|<2 (or 5, or 10) with probability 0.95. So, it totally makes sense to rate the bounds even if they all are >1

Except that Ein and Eout are ratios. I quote from HW2: "Eout (number of out-of-sample points misclassified / total number of outof-sample points)".
Therefore, it is quite impossible for epsilon to ever exceed 1.

IamMrBB 05-02-2012 06:10 AM

Re: HW 4 question3
 
Quote:

Originally Posted by elkka (Post 1753)
I first thought the same thing about 1. But then, where do we see \varepsilon? It is the measure of difference between E_in and E_out, which can be small, and can be big depending on the experiment. Suppose you are talking about an experiment with very large numbers, like the number of minutes people use in a month on a cell phone (which, say, average 200). Than it is totally meaningful to consider a prediction that assures you that |E_{in} -E_{out}|<2 (or 5, or 10) with probability 0.95. So, it totally makes sense to rate the bounds even if they all are >1

I don't think you are right on this. E_in and E_out in the Vapnik-Chervonenkis Inequality (lecture 6), which is the basis for the VC bound, are fractions and not absolute numbers. I know elsewhere in the course the professor has used E_out also for numbers which can be bigger than 1 (e.g. squared error, lecture 8), however when you lookup the Vapnik-Chervonenkis Inequality, you'll see that E_in and E_out are probabilities/probility measures (i.e. fraction incorrectly classified).

To see that your example probably doesn't make sense (IMHO): replace the minutes in your example with either nanoseconds or, on the other hand, ages, and you would get very different numbers on the left side of the equation (i.e. epsilon) while it wouldn't make a difference for the right side of the equation. This can't be right (it would e.g. be unlikely that E_in and E_out are 60 seconds apart but likely that they are a minute apart?!): it would make the inequalities meaningless.

Also on the slides of lecture 6, it is fractions (in)correctly classified that are used for the Vapnik-Chervonenkis Inequality.

Dislaimer: I'm not an expert on the matter, and perhaps I miss a/the point somewhere, so hope we'll get a verdict by the course staff.

elkka 05-02-2012 06:12 AM

Re: HW 4 question3
 
You know, I think you are right. We are indeed only talking about classification problem, so E_in and E_out must be <= 1.

kkkkk 05-02-2012 08:21 AM

Re: HW 4 question3
 
Here is my view which can be wrong. Refer to lecture 4, slides 7 onwards.

Ein and Eout are the average of the error measure per point. And it is up to the user to choose the error measure. So Ein and Eout are just numbers and not probabilities. And so epsilon which is the difference between the two, is also a number.

Also see lecture 8, slides 15 and 20: Eout = bias + variance = 0.21 + 1.69 > 1

mikesakiandcp 05-02-2012 10:16 AM

Re: HW 4 question3
 
Quote:

Originally Posted by silvrous (Post 1723)
I also got values larger than 1 for all of them, and therefore considered them to be equally meaningless for small N...

I also assumed this, since it is a classification problem. Since they are bounds and all greater than one, we cannot infer anything about epsilon for all of them in this range of N, thus they should all be equivalent.

silvrous 05-03-2012 08:57 AM

Re: HW 4 question3
 
Could someone from the course staff perhaps weigh in on this? There seem to be two equally valid theories....

yaser 05-03-2012 02:57 PM

Re: HW 4 question3
 
Quote:

Originally Posted by silvrous (Post 1812)
Could someone from the course staff perhaps weigh in on this? There seem to be two equally valid theories....

If it is a probability then indeed bounds greater than 1 are trivial, but the question just asked about the quality of the bounds for what it's worth. In general, the behavior in practice is proportinal to the VC bound, so the actual value (as opposed to relative value) is not as critical.

data_user 07-31-2012 06:35 PM

Re: HW 4 question3
 
It suggested to use the simple approximate bound N^d_vc for the growth function, if N > d_vc. In Problem 3, N=5<d_vc=50. Should we still use N^d_vc as an approximation for the growth function? Or, maybe it is more reasonable to use 2^N, assuming that H is complex enough?

yaser 07-31-2012 10:24 PM

Re: HW 4 question3
 
Quote:

Originally Posted by data_user (Post 3773)
It suggested to use the simple approximate bound N^d_vc for the growth function, if N > d_vc. In Problem 3, N=5<d_vc=50. Should we still use N^d_vc as an approximation for the growth function? Or, maybe it is more reasonable to use 2^N, assuming that H is complex enough?

Indeed, if N<d_{\rm vc}, then the growth function is exactly 2^N. The fact that {\cal H} is complex enough is already implied by the value of the VC dimension.

data_user 08-01-2012 12:06 PM

Re: HW 4 question3
 
Quote:

Originally Posted by yaser (Post 3775)
The fact that {\cal H} is complex enough is already implied by the value of the VC dimension.

Ah... of course! :)


All times are GMT -7. The time now is 03:59 AM.

Powered by vBulletin® Version 3.8.3
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.
The contents of this forum are to be used ONLY by readers of the Learning From Data book by Yaser S. Abu-Mostafa, Malik Magdon-Ismail, and Hsuan-Tien Lin, and participants in the Learning From Data MOOC by Yaser S. Abu-Mostafa. No part of these contents is to be communicated or made accessible to ANY other person or entity.