Thread: Exercise 3.4
View Single Post
  #9  
Old 10-13-2017, 10:50 AM
johnwang johnwang is offline
Junior Member
 
Join Date: Oct 2017
Posts: 2
Default Re: Exercise 3.4

I still don't understand why "eq1" leads to zero. I know that e_i and e_j are zero mean independent variables. However, H_ij is dependent on both e_i and e_j,, so I don't know how to prove that the sum of H_ij*e_i*e_j has an expected value of zero.

Quote:
Originally Posted by tomaci_necmi View Post
In my textbook, there is a statement mentioned on the topic of linear regression/machine learning, and a question, which is simply quoted as,

Consider a noisy target, y = (w^{*})^T \textbf{x} + \epsilon, for generating the data, where \epsilon is a noise term with zero mean and \sigma^2 variance, independently generated for every example (\textbf{x},y). The expected error of the best possible linear fit to this target is thus \sigma^2.

For the data D =  \{ (\textbf{x}_1,y_1), ..., (\textbf{x}_N,y_N)  \}, denote the noise in y_nas \epsilon_n, and let \mathbf{\epsilon}   = [\epsilon_1, \epsilon_2, ...\epsilon_N]^T; assume that X^TX is invertible. By following the steps below, ***show that the expected in-sample error of linear regression with respect to D is given by***,

\mathbb{E}_D[E_{in}( \textbf{w}_{lin} )] = \sigma^2 (1 - \frac{d+1}{N})


Below is my methodology,


Book says that,

In-sample error vector, \hat{\textbf{y}} - \textbf{y}, can be expressed as (H-I)\epsilon, which is simply, hat matrix, H= X(X^TX)^{-1}X^T, times, error vector, \epsilon.

So, I calculated in-sample error, E_{in}( \textbf{w}_{lin} ), as,

E_{in}( \textbf{w}_{lin} ) = \frac{1}{N}(\hat{\textbf{y}} - \textbf{y})^T (\hat{\textbf{y}} - \textbf{y}) =  \frac{1}{N}  (\epsilon^T (H-I)^T (H-I) \epsilon)

Since it is given by the book that,

(I-H)^K = (I-H), and also (I-H) is symetric, trace(H) = d+1

I got the following simplified expression,

E_{in}( \textbf{w}_{lin} ) =\frac{1}{N}  (\epsilon^T (H-I)^T (H-I) \epsilon) = \frac{1}{N} \epsilon^T (I-H) \epsilon = \frac{1}{N} \epsilon^T \epsilon - \frac{1}{N} \epsilon^T H \epsilon


Here, I see that,

\mathbb{E}_D[\frac{1}{N} \epsilon^T \epsilon] = \frac {N \sigma^2}{N}

And, also, the sum formed by - \frac{1}{N} \epsilon^T H \epsilon, gives the following sum,

- \frac{1}{N} \epsilon^T H \epsilon = - \frac{1}{N} \{ \sum_{i=1}^{N} H_{ii} \epsilon_i^2 + \sum_{i,j \ \in \ \{1..N\} \ and \ i \neq j}^{} \ H_{ij} \ \epsilon_i \ \epsilon_j \}

I undestand that,

- \frac{1}{N} \mathbb{E}_D[\sum_{i=1}^{N} H_{ii} \epsilon_i^2] = - trace(H) \ \sigma^2 = - (d+1) \ \sigma^2


However, I don't understand why,

- \frac{1}{N} \mathbb{E}_D[\sum_{i,j \ \in \ \{1..N\} \ and \ i \neq j}^{} \ H_{ij} \ \epsilon_i \ \epsilon_j ] = 0 \ \ \ \ \ \ \ \ \ \ \ \ (eq \ 1)


(eq 1) should be equal to zero in order to satisfy the equation,



\mathbb{E}_D[E_{in}( \textbf{w}_{lin} )] = \sigma^2 (1 - \frac{d+1}{N})


***Can any one mind to explain me why (eq1) leads to a zero result ?***
Reply With Quote