LFD Book Forum  

Go Back   LFD Book Forum > Course Discussions > Online LFD course > Homework 4

Reply
 
Thread Tools Display Modes
  #11  
Old 05-02-2012, 05:49 AM
rohanag rohanag is offline
Invited Guest
 
Join Date: Apr 2012
Posts: 94
Default Re: HW 4 question3

thanks elkka , i don't know that I was thinking
Reply With Quote
  #12  
Old 05-02-2012, 06:01 AM
silvrous silvrous is offline
Member
 
Join Date: Apr 2012
Posts: 24
Default Re: HW 4 question3

Quote:
Originally Posted by elkka View Post
I first thought the same thing about 1. But then, where do we see \varepsilon? It is the measure of difference between E_in and E_out, which can be small, and can be big depending on the experiment. Suppose you are talking about an experiment with very large numbers, like the number of minutes people use in a month on a cell phone (which, say, average 200). Than it is totally meaningful to consider a prediction that assures you that |E_{in} -E_{out}|<2 (or 5, or 10) with probability 0.95. So, it totally makes sense to rate the bounds even if they all are >1
Except that Ein and Eout are ratios. I quote from HW2: "Eout (number of out-of-sample points misclassified / total number of outof-sample points)".
Therefore, it is quite impossible for epsilon to ever exceed 1.
Reply With Quote
  #13  
Old 05-02-2012, 06:10 AM
IamMrBB IamMrBB is offline
Invited Guest
 
Join Date: Apr 2012
Posts: 107
Default Re: HW 4 question3

Quote:
Originally Posted by elkka View Post
I first thought the same thing about 1. But then, where do we see \varepsilon? It is the measure of difference between E_in and E_out, which can be small, and can be big depending on the experiment. Suppose you are talking about an experiment with very large numbers, like the number of minutes people use in a month on a cell phone (which, say, average 200). Than it is totally meaningful to consider a prediction that assures you that |E_{in} -E_{out}|<2 (or 5, or 10) with probability 0.95. So, it totally makes sense to rate the bounds even if they all are >1
I don't think you are right on this. E_in and E_out in the Vapnik-Chervonenkis Inequality (lecture 6), which is the basis for the VC bound, are fractions and not absolute numbers. I know elsewhere in the course the professor has used E_out also for numbers which can be bigger than 1 (e.g. squared error, lecture 8), however when you lookup the Vapnik-Chervonenkis Inequality, you'll see that E_in and E_out are probabilities/probility measures (i.e. fraction incorrectly classified).

To see that your example probably doesn't make sense (IMHO): replace the minutes in your example with either nanoseconds or, on the other hand, ages, and you would get very different numbers on the left side of the equation (i.e. epsilon) while it wouldn't make a difference for the right side of the equation. This can't be right (it would e.g. be unlikely that E_in and E_out are 60 seconds apart but likely that they are a minute apart?!): it would make the inequalities meaningless.

Also on the slides of lecture 6, it is fractions (in)correctly classified that are used for the Vapnik-Chervonenkis Inequality.

Dislaimer: I'm not an expert on the matter, and perhaps I miss a/the point somewhere, so hope we'll get a verdict by the course staff.
Reply With Quote
  #14  
Old 05-02-2012, 06:12 AM
elkka elkka is offline
Invited Guest
 
Join Date: Apr 2012
Posts: 57
Default Re: HW 4 question3

You know, I think you are right. We are indeed only talking about classification problem, so E_in and E_out must be <= 1.
Reply With Quote
  #15  
Old 05-02-2012, 08:21 AM
kkkkk kkkkk is offline
Invited Guest
 
Join Date: Mar 2012
Posts: 71
Default Re: HW 4 question3

Here is my view which can be wrong. Refer to lecture 4, slides 7 onwards.

Ein and Eout are the average of the error measure per point. And it is up to the user to choose the error measure. So Ein and Eout are just numbers and not probabilities. And so epsilon which is the difference between the two, is also a number.

Also see lecture 8, slides 15 and 20: Eout = bias + variance = 0.21 + 1.69 > 1
Reply With Quote
  #16  
Old 05-02-2012, 10:16 AM
mikesakiandcp mikesakiandcp is offline
Member
 
Join Date: Apr 2012
Posts: 31
Default Re: HW 4 question3

Quote:
Originally Posted by silvrous View Post
I also got values larger than 1 for all of them, and therefore considered them to be equally meaningless for small N...
I also assumed this, since it is a classification problem. Since they are bounds and all greater than one, we cannot infer anything about epsilon for all of them in this range of N, thus they should all be equivalent.
Reply With Quote
  #17  
Old 05-03-2012, 08:57 AM
silvrous silvrous is offline
Member
 
Join Date: Apr 2012
Posts: 24
Default Re: HW 4 question3

Could someone from the course staff perhaps weigh in on this? There seem to be two equally valid theories....
Reply With Quote
  #18  
Old 05-03-2012, 02:57 PM
yaser's Avatar
yaser yaser is offline
Caltech
 
Join Date: Aug 2009
Location: Pasadena, California, USA
Posts: 1,477
Default Re: HW 4 question3

Quote:
Originally Posted by silvrous View Post
Could someone from the course staff perhaps weigh in on this? There seem to be two equally valid theories....
If it is a probability then indeed bounds greater than 1 are trivial, but the question just asked about the quality of the bounds for what it's worth. In general, the behavior in practice is proportinal to the VC bound, so the actual value (as opposed to relative value) is not as critical.
__________________
Where everyone thinks alike, no one thinks very much
Reply With Quote
  #19  
Old 07-31-2012, 06:35 PM
data_user data_user is offline
Junior Member
 
Join Date: Jul 2012
Posts: 6
Default Re: HW 4 question3

It suggested to use the simple approximate bound N^d_vc for the growth function, if N > d_vc. In Problem 3, N=5<d_vc=50. Should we still use N^d_vc as an approximation for the growth function? Or, maybe it is more reasonable to use 2^N, assuming that H is complex enough?
Reply With Quote
  #20  
Old 07-31-2012, 10:24 PM
yaser's Avatar
yaser yaser is offline
Caltech
 
Join Date: Aug 2009
Location: Pasadena, California, USA
Posts: 1,477
Default Re: HW 4 question3

Quote:
Originally Posted by data_user View Post
It suggested to use the simple approximate bound N^d_vc for the growth function, if N > d_vc. In Problem 3, N=5<d_vc=50. Should we still use N^d_vc as an approximation for the growth function? Or, maybe it is more reasonable to use 2^N, assuming that H is complex enough?
Indeed, if N<d_{\rm vc}, then the growth function is exactly 2^N. The fact that {\cal H} is complex enough is already implied by the value of the VC dimension.
__________________
Where everyone thinks alike, no one thinks very much
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT -7. The time now is 12:58 PM.


Powered by vBulletin® Version 3.8.3
Copyright ©2000 - 2019, Jelsoft Enterprises Ltd.
The contents of this forum are to be used ONLY by readers of the Learning From Data book by Yaser S. Abu-Mostafa, Malik Magdon-Ismail, and Hsuan-Tien Lin, and participants in the Learning From Data MOOC by Yaser S. Abu-Mostafa. No part of these contents is to be communicated or made accessible to ANY other person or entity.