LFD Book Forum *ANSWER* Question 4 linear regression hypothesis
 User Name Remember Me? Password
 Register FAQ Calendar Mark Forums Read

 Thread Tools Display Modes
#1
04-25-2013, 01:25 PM
 binchen.bin@gmail.com Junior Member Join Date: Apr 2013 Posts: 7
*ANSWER* Question 4 linear regression hypothesis

On question 4, I tried to fit each of two sample points through (i) h = ax, and (ii) h = ax+b. I found that hypothesis (i) gave me an average of "a" quite different to any of the answers, but hypothesis (ii) gave me an average of "a" very close to one of the answer options and the average of b is virtually close to 0. If average of b is 0 in (ii), why the average of a are different in (i) and (ii)? Can anyone help me explaining these?
#2
04-25-2013, 02:03 PM
 yaser Caltech Join Date: Aug 2009 Location: Pasadena, California, USA Posts: 1,478
Re: Question 4 linear regression hypothesis

Quote:
 Originally Posted by binchen.bin@gmail.com On question 4, I tried to fit each of two sample points through (i) h = ax, and (ii) h = ax+b. I found that hypothesis (i) gave me an average of "a" quite different to any of the answers, but hypothesis (ii) gave me an average of "a" very close to one of the answer options and the average of b is virtually close to 0. If average of b is 0 in (ii), why the average of a are different in (i) and (ii)? Can anyone help me explaining these?
Let me address why they can be different. The model can fit both points in the training set perfectly, while the model finds a compromise that minimizes the mean-squared error on those points. Because of this, we have different fits that can have different averages. Symmetry dictates that will average to zero for the first model, but that does not mean that the average should be the same as the second model.

Having said that, you should get an answer that matches one of the 5 choices when you fit the model properly.
__________________
Where everyone thinks alike, no one thinks very much
#3
04-26-2013, 09:40 AM
 binchen.bin@gmail.com Junior Member Join Date: Apr 2013 Posts: 7
Re: Question 4 linear regression hypothesis

Thank professor for the reply. I found my current answer is quite close to a similar post last year (i.e., g_hat = 1.42. I pasted that post below). But I could not access to the suggested link for the discussion (http://book.caltech.edu/bookforum/showthread.php?t=424). Does anyone know what causes this solution?

Thanks.

Re: Questions 4-6 of Hwk 4 (bias and variance)
Quote:
Originally Posted by gmathew
I solved this problem very carefully many times. But I am not getting the answers given in the homework solution for 5 and 6. These are the answers I am getting.

I am getting ghat = 1.4272x
bias = 0.5413
variance = 0.4725

Is anybody else getting similar answers? Can the intructor please verify the answers given in the homework solution?

thanks
look this topic: http://book.caltech.edu/bookforum/showthread.php?t=424
there is a good discussion about these problems
#4
04-26-2013, 10:10 AM
 binchen.bin@gmail.com Junior Member Join Date: Apr 2013 Posts: 7
Re: Question 4 linear regression hypothesis

Apparently quite some people got the same wrong answer (g_hat = 1.4) according to this post: http://book.caltech.edu/bookforum/showthread.php?t=430.

I am desperately in need of suggestion here to figure out the reason. My thought on this question is pretty straight forward: generate a two point sample:
Xsamples = (rand(2, 1)-0.5)*2;
yn = sin(Xsamples.*pi);
then
a is calculated by
regress(yn, Xsamples) (to minimize the squared error)
Repeat 1000 times and take the average of a (because of uniform distributed input X), that gives me 1.42.

Any advice is appreciated.
#5
04-26-2013, 10:45 AM
 yaser Caltech Join Date: Aug 2009 Location: Pasadena, California, USA Posts: 1,478
Re: *ANSWER* Question 4 linear regression hypothesis

__________________
Where everyone thinks alike, no one thinks very much
#6
04-27-2013, 06:28 PM
 gmacaree Junior Member Join Date: Apr 2013 Posts: 2
Re: Question 4 linear regression hypothesis

Quote:
 Originally Posted by binchen.bin@gmail.com Apparently quite some people got the same wrong answer (g_hat = 1.4) according to this post: http://book.caltech.edu/bookforum/showthread.php?t=430. I am desperately in need of suggestion here to figure out the reason. My thought on this question is pretty straight forward: generate a two point sample: Xsamples = (rand(2, 1)-0.5)*2; yn = sin(Xsamples.*pi); then a is calculated by regress(yn, Xsamples) (to minimize the squared error) Repeat 1000 times and take the average of a (because of uniform distributed input X), that gives me 1.42. Any advice is appreciated.
Am having exactly the same issue -- I was using a lazy point-fitting technique and came up with an answer matching one of the choices, but then as soon as I used a more rigourous solution for through-the-origin regression I end up with a slope of 1.42. That's obviously incorrect but I'm slightly baffled as to why I get it right when I do it wrong and wrong when I do it right!
#7
04-29-2013, 12:05 AM
 Ziad Hatahet Member Join Date: Apr 2013 Location: San Francisco, CA Posts: 23
Re: Question 4 linear regression hypothesis

Quote:
 Originally Posted by gmacaree Am having exactly the same issue -- I was using a lazy point-fitting technique and came up with an answer matching one of the choices, but then as soon as I used a more rigourous solution for through-the-origin regression I end up with a slope of 1.42. That's obviously incorrect but I'm slightly baffled as to why I get it right when I do it wrong and wrong when I do it right!
Same here. I used the linear regression-through-the-origin formula and came up with a slope of . The first time I just used the slope from the line that passes through both points, and got .

What makes you say that is obviously incorrect though?

 Thread Tools Display Modes Linear Mode

 Posting Rules You may not post new threads You may not post replies You may not post attachments You may not edit your posts BB code is On Smilies are On [IMG] code is On HTML code is Off Forum Rules
 Forum Jump User Control Panel Private Messages Subscriptions Who's Online Search Forums Forums Home General     General Discussion of Machine Learning     Free Additional Material         Dynamic e-Chapters         Dynamic e-Appendices Course Discussions     Online LFD course         General comments on the course         Homework 1         Homework 2         Homework 3         Homework 4         Homework 5         Homework 6         Homework 7         Homework 8         The Final         Create New Homework Problems Book Feedback - Learning From Data     General comments on the book     Chapter 1 - The Learning Problem     Chapter 2 - Training versus Testing     Chapter 3 - The Linear Model     Chapter 4 - Overfitting     Chapter 5 - Three Learning Principles     e-Chapter 6 - Similarity Based Methods     e-Chapter 7 - Neural Networks     e-Chapter 8 - Support Vector Machines     e-Chapter 9 - Learning Aides     Appendix and Notation     e-Appendices

All times are GMT -7. The time now is 09:36 PM.

 Contact Us - LFD Book - Top