 LFD Book Forum (http://book.caltech.edu/bookforum/index.php)
-   Chapter 3 - The Linear Model (http://book.caltech.edu/bookforum/forumdisplay.php?f=110)
-   -   Exercise 3.4 (http://book.caltech.edu/bookforum/showthread.php?t=4484)

 tomaci_necmi 05-30-2014 05:21 AM

Exercise 3.4

I asked the question at math stack exchange,

http://math.stackexchange.com/questi...to-a-dataset-d

Can anyone explain me how it works ?

 tomaci_necmi 05-30-2014 05:22 AM

Re: Exercise 3.4

In my textbook, there is a statement mentioned on the topic of linear regression/machine learning, and a question, which is simply quoted as,

Consider a noisy target, , for generating the data, where is a noise term with zero mean and variance, independently generated for every example . The expected error of the best possible linear fit to this target is thus .

For the data , denote the noise in as , and let ; assume that is invertible. By following the steps below, ***show that the expected in-sample error of linear regression with respect to is given by***, Below is my methodology,

Book says that,

In-sample error vector, , can be expressed as , which is simply, hat matrix, , times, error vector, .

So, I calculated in-sample error, , as, Since it is given by the book that, , and also is symetric, I got the following simplified expression, Here, I see that, And, also, the sum formed by , gives the following sum, I undestand that, However, I don't understand why,  should be equal to zero in order to satisfy the equation, ***Can any one mind to explain me why leads to a zero result ?***

 tomaci_necmi 05-30-2014 05:49 AM

Re: Exercise 3.4

Well all fits now, just my mind played a game to me.

off course yaser 05-30-2014 11:46 AM

Re: Exercise 3.4

Thank you for the question and the answer.

 yongxien 08-03-2015 01:33 AM

Re: Exercise 3.4

Why the last statement is 0? I don't quite understand. Does the mean being zero imply E(e_i) and E(e_j) = 0? I find it weird if that is the case. Because that will mean E(e_i) = 0 but E(e_i^2) = \sigma^2. I understand E(e_i^2) = \sigma^2 from statistics but not the first part.

If it is not the case, then what is the reason for the last statement to be 0?

 htlin 08-03-2015 03:22 PM

Re: Exercise 3.4

In the problem statement, I think "zero mean of the noise" is a given condition? :clueless:

 zhout2 10-07-2016 07:13 PM

Re: Exercise 3.4

 zhout2 10-10-2016 02:53 PM

Re: Exercise 3.4

Quote:
 Originally Posted by zhout2 (Post 12451)
Never mind. It's just a typo in the original post. The answer is still correct.

 johnwang 10-13-2017 09:50 AM

Re: Exercise 3.4

I still don't understand why "eq1" leads to zero. I know that e_i and e_j are zero mean independent variables. However, H_ij is dependent on both e_i and e_j,, so I don't know how to prove that the sum of H_ij*e_i*e_j has an expected value of zero.

Quote:
 Originally Posted by tomaci_necmi (Post 11678) In my textbook, there is a statement mentioned on the topic of linear regression/machine learning, and a question, which is simply quoted as, Consider a noisy target, , for generating the data, where is a noise term with zero mean and variance, independently generated for every example . The expected error of the best possible linear fit to this target is thus . For the data , denote the noise in as , and let ; assume that is invertible. By following the steps below, ***show that the expected in-sample error of linear regression with respect to is given by***, Below is my methodology, Book says that, In-sample error vector, , can be expressed as , which is simply, hat matrix, , times, error vector, . So, I calculated in-sample error, , as, Since it is given by the book that, , and also is symetric, I got the following simplified expression, Here, I see that, And, also, the sum formed by , gives the following sum, I undestand that, However, I don't understand why,  should be equal to zero in order to satisfy the equation, ***Can any one mind to explain me why leads to a zero result ?***

 johnwang 10-13-2017 09:55 AM

Re: Exercise 3.4

Is it because the noise is generated independently for each datapoint?

All times are GMT -7. The time now is 09:38 PM.