LFD Book Forum  

Go Back   LFD Book Forum > Course Discussions > Online LFD course > Homework 6

Reply
 
Thread Tools Display Modes
  #1  
Old 05-11-2012, 02:42 AM
lucifirm lucifirm is offline
Member
 
Join Date: Apr 2012
Posts: 20
Question Hw 6 q1

The problem question is:
"In general, if we use H′ instead of H, how does deterministic noise behave?"

My doubt is:
Is this for a fixed N? A small N or large enough to get rid of overfitting?
Reply With Quote
  #2  
Old 05-11-2012, 10:04 AM
mikesakiandcp mikesakiandcp is offline
Member
 
Join Date: Apr 2012
Posts: 31
Default Re: Hw 6 q1

Quote:
Originally Posted by lucifirm View Post
The problem question is:
"In general, if we use H′ instead of H, how does deterministic noise behave?"

My doubt is:
Is this for a fixed N? A small N or large enough to get rid of overfitting?
The size of the training set (N) is more related to overfitting of stochastic noise. Deterministic noise is the ability of the hypothesis set to fit the target function.
Reply With Quote
  #3  
Old 05-11-2012, 11:59 AM
elkka elkka is offline
Invited Guest
 
Join Date: Apr 2012
Posts: 57
Default Re: Hw 6 q1

I am also confused about this term "in general". Does it mean - in absolutely any situation. Or does it mean in most situations? Or - in all reasonable situations, excluding cases when we try to fit 10 degree polynomials to 10 points, as in this lecture's example?

mikesakiandcp, I think N has to do with deterministic noise, at least as described in the lecture. Yes, it is the ability of the hypothesis set to fit the target function, measured as expected difference between the "best" hypothesis and the target. But the way we defined the expected hypothesis, as an expectation over infinite number of data sets of specific size N - that depends on N very much. Slide 14, Lecture 11, illustrates the connection.
Reply With Quote
  #4  
Old 05-11-2012, 01:11 PM
AqibEjaz AqibEjaz is offline
Junior Member
 
Join Date: May 2012
Posts: 7
Default Re: Hw 6 q1

@elkka: Well the "deterministic noise" is actually independent of N, refer to lecture 08 slide 20, You can see that the "bias" remains the same no matter how large N becomes. With increasing N it is the variance that becomes smaller and hence overall Eout becomes smaller. As I understand it, if you have infinite training sets, then it does not matter whether you have 10 points in each set or 10,000 points, the average hypothesis will remain the same. In case of 10 points, the different hypotheses we get from each training set will be spread all over the place but they will be "centered" around the same hypothesis (i.e. the average hypothesis). In case of 10,000 points, the individual hypotheses will be less spread out but again they will be centered around the same hypothesis as that in the 10 points. "Bias" only depends upon the mismatch between the target function and the modelling hypothesis.
Reply With Quote
  #5  
Old 05-11-2012, 01:39 PM
mikesakiandcp mikesakiandcp is offline
Member
 
Join Date: Apr 2012
Posts: 31
Default Re: Hw 6 q1

Quote:
Originally Posted by elkka View Post
I am also confused about this term "in general". Does it mean - in absolutely any situation. Or does it mean in most situations? Or - in all reasonable situations, excluding cases when we try to fit 10 degree polynomials to 10 points, as in this lecture's example?

mikesakiandcp, I think N has to do with deterministic noise, at least as described in the lecture. Yes, it is the ability of the hypothesis set to fit the target function, measured as expected difference between the "best" hypothesis and the target. But the way we defined the expected hypothesis, as an expectation over infinite number of data sets of specific size N - that depends on N very much. Slide 14, Lecture 11, illustrates the connection.
You are right, N is related to the deterministic noise. What I meant to say is that we have no control over N (since it is the number of inputs in our training set, which we have no control over). Given a fixed training set (and thus a fixed N), we are interested in how well the hypothesis set can approximate the target function.
Reply With Quote
  #6  
Old 05-11-2012, 02:00 PM
gjtucker gjtucker is offline
Junior Member
 
Join Date: May 2012
Posts: 7
Default Re: Hw 6 q1

It seems like it depends on the definition of deterministic noise. If we define it as E_x[(g^{bar}(x) - f(x))^2] (as was done in the lecture slides) and we assume that g^{bar} is the best hypothesis in H, then it is independent of N.

Where the finite N comes in is through the variance term. With few N, the more complicated model will have a harder time finding the best hypothesis and have high variance, which what we see in the plots in lecture. But, as N increases, my guess is that E_x[(g^{bar}(x) - f(x))^2] says approximately the same, while the variance term goes down. I suppose this wouldn't be too hard to check numerically.
Reply With Quote
  #7  
Old 05-11-2012, 03:55 PM
dudefromdayton dudefromdayton is offline
Invited Guest
 
Join Date: Apr 2012
Posts: 140
Default Re: Hw 6 q1

Heads up from the textbook: exercise 4.3 on page 125!
Reply With Quote
  #8  
Old 05-11-2012, 05:35 PM
elkka elkka is offline
Invited Guest
 
Join Date: Apr 2012
Posts: 57
Default Re: Hw 6 q1

Thanks, but I don't have the book
Reply With Quote
  #9  
Old 05-12-2012, 02:37 AM
vasilism vasilism is offline
Junior Member
 
Join Date: Apr 2012
Posts: 4
Default Re: Hw 6 q1

Some questions,
What does it means that a function (H') is a subset of another function (H)?
H' is picked from the same data model we use for H?
Reply With Quote
  #10  
Old 05-12-2012, 02:57 AM
yaser's Avatar
yaser yaser is offline
Caltech
 
Join Date: Aug 2009
Location: Pasadena, California, USA
Posts: 1,474
Default Re: Hw 6 q1

Quote:
Originally Posted by vasilism View Post
Some questions,
What does it means that a function (H') is a subset of another function (H)?
H' is picked from the same data model we use for H?
{\cal H} and {\cal H}' are not functions, but rather sets of functions (the hypotheses h\in{\cal H} are functions).
__________________
Where everyone thinks alike, no one thinks very much
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT -7. The time now is 05:58 AM.


Powered by vBulletin® Version 3.8.3
Copyright ©2000 - 2018, Jelsoft Enterprises Ltd.
The contents of this forum are to be used ONLY by readers of the Learning From Data book by Yaser S. Abu-Mostafa, Malik Magdon-Ismail, and Hsuan-Tien Lin, and participants in the Learning From Data MOOC by Yaser S. Abu-Mostafa. No part of these contents is to be communicated or made accessible to ANY other person or entity.