LFD Book Forum (http://book.caltech.edu/bookforum/index.php)
-   Homework 4 (http://book.caltech.edu/bookforum/forumdisplay.php?f=133)
-   -   *ANSWER* Question 4 linear regression hypothesis (http://book.caltech.edu/bookforum/showthread.php?t=4245)

 binchen.bin@gmail.com 04-25-2013 02:25 PM

*ANSWER* Question 4 linear regression hypothesis

On question 4, I tried to fit each of two sample points through (i) h = ax, and (ii) h = ax+b. I found that hypothesis (i) gave me an average of "a" quite different to any of the answers, but hypothesis (ii) gave me an average of "a" very close to one of the answer options and the average of b is virtually close to 0. If average of b is 0 in (ii), why the average of a are different in (i) and (ii)? Can anyone help me explaining these?

 yaser 04-25-2013 03:03 PM

Re: Question 4 linear regression hypothesis

Quote:
 Originally Posted by binchen.bin@gmail.com (Post 10598) On question 4, I tried to fit each of two sample points through (i) h = ax, and (ii) h = ax+b. I found that hypothesis (i) gave me an average of "a" quite different to any of the answers, but hypothesis (ii) gave me an average of "a" very close to one of the answer options and the average of b is virtually close to 0. If average of b is 0 in (ii), why the average of a are different in (i) and (ii)? Can anyone help me explaining these?
Let me address why they can be different. The model can fit both points in the training set perfectly, while the model finds a compromise that minimizes the mean-squared error on those points. Because of this, we have different fits that can have different averages. Symmetry dictates that will average to zero for the first model, but that does not mean that the average should be the same as the second model.

Having said that, you should get an answer that matches one of the 5 choices when you fit the model properly.

 binchen.bin@gmail.com 04-26-2013 10:40 AM

Re: Question 4 linear regression hypothesis

Thank professor for the reply. I found my current answer is quite close to a similar post last year (i.e., g_hat = 1.42. I pasted that post below). But I could not access to the suggested link for the discussion (http://book.caltech.edu/bookforum/showthread.php?t=424). Does anyone know what causes this solution?

Thanks.

Re: Questions 4-6 of Hwk 4 (bias and variance)
Quote:
Originally Posted by gmathew
I solved this problem very carefully many times. But I am not getting the answers given in the homework solution for 5 and 6. These are the answers I am getting.

I am getting ghat = 1.4272x
bias = 0.5413
variance = 0.4725

Is anybody else getting similar answers? Can the intructor please verify the answers given in the homework solution?

thanks
look this topic: http://book.caltech.edu/bookforum/showthread.php?t=424
there is a good discussion about these problems

 binchen.bin@gmail.com 04-26-2013 11:10 AM

Re: Question 4 linear regression hypothesis

Apparently quite some people got the same wrong answer (g_hat = 1.4) according to this post: http://book.caltech.edu/bookforum/showthread.php?t=430.

I am desperately in need of suggestion here to figure out the reason. My thought on this question is pretty straight forward: generate a two point sample:
Xsamples = (rand(2, 1)-0.5)*2;
yn = sin(Xsamples.*pi);
then
a is calculated by
regress(yn, Xsamples) (to minimize the squared error)
Repeat 1000 times and take the average of a (because of uniform distributed input X), that gives me 1.42.

Any advice is appreciated.

 yaser 04-26-2013 11:45 AM

Re: *ANSWER* Question 4 linear regression hypothesis

 gmacaree 04-27-2013 07:28 PM

Re: Question 4 linear regression hypothesis

Quote:
 Originally Posted by binchen.bin@gmail.com (Post 10607) Apparently quite some people got the same wrong answer (g_hat = 1.4) according to this post: http://book.caltech.edu/bookforum/showthread.php?t=430. I am desperately in need of suggestion here to figure out the reason. My thought on this question is pretty straight forward: generate a two point sample: Xsamples = (rand(2, 1)-0.5)*2; yn = sin(Xsamples.*pi); then a is calculated by regress(yn, Xsamples) (to minimize the squared error) Repeat 1000 times and take the average of a (because of uniform distributed input X), that gives me 1.42. Any advice is appreciated.
Am having exactly the same issue -- I was using a lazy point-fitting technique and came up with an answer matching one of the choices, but then as soon as I used a more rigourous solution for through-the-origin regression I end up with a slope of 1.42. That's obviously incorrect but I'm slightly baffled as to why I get it right when I do it wrong and wrong when I do it right!

 Ziad Hatahet 04-29-2013 01:05 AM

Re: Question 4 linear regression hypothesis

Quote:
 Originally Posted by gmacaree (Post 10615) Am having exactly the same issue -- I was using a lazy point-fitting technique and came up with an answer matching one of the choices, but then as soon as I used a more rigourous solution for through-the-origin regression I end up with a slope of 1.42. That's obviously incorrect but I'm slightly baffled as to why I get it right when I do it wrong and wrong when I do it right!
Same here. I used the linear regression-through-the-origin formula and came up with a slope of . The first time I just used the slope from the line that passes through both points, and got .

What makes you say that is obviously incorrect though?

 All times are GMT -7. The time now is 12:02 AM.