LFD Book Forum  

Go Back   LFD Book Forum > Course Discussions > Online LFD course > Homework 4

Reply
 
Thread Tools Display Modes
  #1  
Old 04-25-2013, 01:25 PM
binchen.bin@gmail.com binchen.bin@gmail.com is offline
Junior Member
 
Join Date: Apr 2013
Posts: 7
Default *ANSWER* Question 4 linear regression hypothesis

On question 4, I tried to fit each of two sample points through (i) h = ax, and (ii) h = ax+b. I found that hypothesis (i) gave me an average of "a" quite different to any of the answers, but hypothesis (ii) gave me an average of "a" very close to one of the answer options and the average of b is virtually close to 0. If average of b is 0 in (ii), why the average of a are different in (i) and (ii)? Can anyone help me explaining these?
Reply With Quote
  #2  
Old 04-25-2013, 02:03 PM
yaser's Avatar
yaser yaser is offline
Caltech
 
Join Date: Aug 2009
Location: Pasadena, California, USA
Posts: 1,476
Default Re: Question 4 linear regression hypothesis

Quote:
Originally Posted by binchen.bin@gmail.com View Post
On question 4, I tried to fit each of two sample points through (i) h = ax, and (ii) h = ax+b. I found that hypothesis (i) gave me an average of "a" quite different to any of the answers, but hypothesis (ii) gave me an average of "a" very close to one of the answer options and the average of b is virtually close to 0. If average of b is 0 in (ii), why the average of a are different in (i) and (ii)? Can anyone help me explaining these?
Let me address why they can be different. The model h(x)=ax+b can fit both points in the training set {\cal D} perfectly, while the model h(x)=ax finds a compromise that minimizes the mean-squared error on those points. Because of this, we have different fits that can have different averages. Symmetry dictates that b will average to zero for the first model, but that does not mean that the average a should be the same as the second model.

Having said that, you should get an answer that matches one of the 5 choices when you fit the h(x)=ax model properly.
__________________
Where everyone thinks alike, no one thinks very much
Reply With Quote
  #3  
Old 04-26-2013, 09:40 AM
binchen.bin@gmail.com binchen.bin@gmail.com is offline
Junior Member
 
Join Date: Apr 2013
Posts: 7
Default Re: Question 4 linear regression hypothesis

Thank professor for the reply. I found my current answer is quite close to a similar post last year (i.e., g_hat = 1.42. I pasted that post below). But I could not access to the suggested link for the discussion (http://book.caltech.edu/bookforum/showthread.php?t=424). Does anyone know what causes this solution?

Thanks.


Re: Questions 4-6 of Hwk 4 (bias and variance)
Quote:
Originally Posted by gmathew
I solved this problem very carefully many times. But I am not getting the answers given in the homework solution for 5 and 6. These are the answers I am getting.

I am getting ghat = 1.4272x
bias = 0.5413
variance = 0.4725

Is anybody else getting similar answers? Can the intructor please verify the answers given in the homework solution?

thanks
look this topic: http://book.caltech.edu/bookforum/showthread.php?t=424
there is a good discussion about these problems
Reply With Quote
  #4  
Old 04-26-2013, 10:10 AM
binchen.bin@gmail.com binchen.bin@gmail.com is offline
Junior Member
 
Join Date: Apr 2013
Posts: 7
Default Re: Question 4 linear regression hypothesis

Apparently quite some people got the same wrong answer (g_hat = 1.4) according to this post: http://book.caltech.edu/bookforum/showthread.php?t=430.

I am desperately in need of suggestion here to figure out the reason. My thought on this question is pretty straight forward: generate a two point sample:
Xsamples = (rand(2, 1)-0.5)*2;
yn = sin(Xsamples.*pi);
then
a is calculated by
regress(yn, Xsamples) (to minimize the squared error)
Repeat 1000 times and take the average of a (because of uniform distributed input X), that gives me 1.42.

Any advice is appreciated.
Reply With Quote
  #5  
Old 04-26-2013, 10:45 AM
yaser's Avatar
yaser yaser is offline
Caltech
 
Join Date: Aug 2009
Location: Pasadena, California, USA
Posts: 1,476
Default Re: *ANSWER* Question 4 linear regression hypothesis

I added *ANSWER* to the thread title since you started discussing specific answers.
__________________
Where everyone thinks alike, no one thinks very much
Reply With Quote
  #6  
Old 04-27-2013, 06:28 PM
gmacaree gmacaree is offline
Junior Member
 
Join Date: Apr 2013
Posts: 2
Default Re: Question 4 linear regression hypothesis

Quote:
Originally Posted by binchen.bin@gmail.com View Post
Apparently quite some people got the same wrong answer (g_hat = 1.4) according to this post: http://book.caltech.edu/bookforum/showthread.php?t=430.

I am desperately in need of suggestion here to figure out the reason. My thought on this question is pretty straight forward: generate a two point sample:
Xsamples = (rand(2, 1)-0.5)*2;
yn = sin(Xsamples.*pi);
then
a is calculated by
regress(yn, Xsamples) (to minimize the squared error)
Repeat 1000 times and take the average of a (because of uniform distributed input X), that gives me 1.42.

Any advice is appreciated.
Am having exactly the same issue -- I was using a lazy point-fitting technique and came up with an answer matching one of the choices, but then as soon as I used a more rigourous solution for through-the-origin regression I end up with a slope of 1.42. That's obviously incorrect but I'm slightly baffled as to why I get it right when I do it wrong and wrong when I do it right!
Reply With Quote
  #7  
Old 04-29-2013, 12:05 AM
Ziad Hatahet Ziad Hatahet is offline
Member
 
Join Date: Apr 2013
Location: San Francisco, CA
Posts: 23
Default Re: Question 4 linear regression hypothesis

Quote:
Originally Posted by gmacaree View Post
Am having exactly the same issue -- I was using a lazy point-fitting technique and came up with an answer matching one of the choices, but then as soon as I used a more rigourous solution for through-the-origin regression I end up with a slope of 1.42. That's obviously incorrect but I'm slightly baffled as to why I get it right when I do it wrong and wrong when I do it right!
Same here. I used the linear regression-through-the-origin formula and came up with a slope of ~1.42. The first time I just used the slope from the line that passes through both points, and got 0.79.

What makes you say that 1.42 is obviously incorrect though?
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT -7. The time now is 04:16 PM.


Powered by vBulletin® Version 3.8.3
Copyright ©2000 - 2019, Jelsoft Enterprises Ltd.
The contents of this forum are to be used ONLY by readers of the Learning From Data book by Yaser S. Abu-Mostafa, Malik Magdon-Ismail, and Hsuan-Tien Lin, and participants in the Learning From Data MOOC by Yaser S. Abu-Mostafa. No part of these contents is to be communicated or made accessible to ANY other person or entity.