LFD Book Forum

LFD Book Forum (http://book.caltech.edu/bookforum/index.php)
-   Chapter 3 - The Linear Model (http://book.caltech.edu/bookforum/forumdisplay.php?f=110)
-   -   Exercise 3.4 (http://book.caltech.edu/bookforum/showthread.php?t=4353)

xuewei4d 06-14-2013 10:08 AM

Exercise 3.4
 
:clueless:

I didn't get correct answer to Exercise 3.4 (c).

Exercise 3.4(b), I think the answer would be E_{\text{in}}(\mathbf{w}_{\text{in}})=\frac{1}{N}\epsilon^TH\epsilon

Exercise 3.4(c), by independence between different \epsilon_i, I have \mathbb{E}_{\mathcal{D}}[E_{\text{in}}(\mathbf{w}_{\text{in}})] = \frac{1}{N} \sum_i H_{ii} \mathbb{E}(\epsilon^2) = \frac{\sigma^2}{N}(d+1).

Where am I wrong?

htlin 06-17-2013 08:27 PM

Re: Exercise 3.4
 
You can consider double-checking your answer of 3.4(b). Hope this helps.

i_need_some_help 10-06-2013 02:16 PM

Re: Exercise 3.4
 
I am not sure how to approach part (a). Are we supposed to explain why that in-sample estimate intuitively makes sense, or (algebraically) manipulate expressions given earlier into it?

magdon 10-06-2013 09:18 PM

Re: Exercise 3.4
 
Algebraically manipulate earlier expressions and you should get 3.4(a). It is essentially a restatement of \hat{\mathbf{y}}=X{\mathbf{w}}_{lin}.

Sweater Monkey 10-07-2013 12:00 AM

Re: Exercise 3.4
 
I'm not sure if I'm going about part (e) correctly.

I'm under the impression that E_{\text{test}}(\mathbf{w}_{\text{lin}})=\frac{1}{N}||X{\mathbf{w}}_{lin}-\mathbf{y'}||^2

where \hat{\mathbf{y}}=X{\mathbf{w}}_{lin}=X\mathbf{w}^*+H\mathbf{\epsilon} as derived earlier
and \mathbf{y'}=\mathbf{w}^{*T}\mathbf{x}+\mathbf{\epsilon'}=X\mathbf{w}^*+\mathbf{\epsilon'}

This lead me to \frac{1}{N}||H\mathbf{\epsilon}-\mathbf{\epsilon'}||^2

I carried out the expansion of this expression and then simplified into the relevant terms but my final answer is \sigma^2(1+(d+1)) because the N term cancels out.

Am I starting out correctly up until this expansion or is my thought process off from the start? And if I am heading in the right direction is there any obvious reason that I may be expanding the expression incorrectly? Any help would be greatly appreciated.

ddas2 10-07-2013 01:46 AM

Re: Exercise 3.4
 
1. I got $y^{\prime}=y-\epsilon+\epsilon^{\prime}$.
and $\hat{y}-y^{\prime}=H\epsilon +\epsilon^{\prime}$.

magdon 10-07-2013 05:53 AM

Re: Exercise 3.4
 
You got it mostly right. Your error is assuming both term, the H term and the one without the H give an N to cancel the N in the denominator. One term gives an N and the other gives a (d+1).


Quote:

Originally Posted by Sweater Monkey (Post 11541)
I'm not sure if I'm going about part (e) correctly.

I'm under the impression that E_{\text{test}}(\mathbf{w}_{\text{lin}})=\frac{1}{N}||X{\mathbf{w}}_{lin}-\mathbf{y'}||^2

where \hat{\mathbf{y}}=X{\mathbf{w}}_{lin}=X\mathbf{w}^*+H\mathbf{\epsilon} as derived earlier
and \mathbf{y'}=\mathbf{w}^{*T}\mathbf{x}+\mathbf{\epsilon'}=X\mathbf{w}^*+\mathbf{\epsilon'}

This lead me to \frac{1}{N}||H\mathbf{\epsilon}-\mathbf{\epsilon'}||^2

I carried out the expansion of this expression and then simplified into the relevant terms but my final answer is \sigma^2(1+(d+1)) because the N term cancels out.

Am I starting out correctly up until this expansion or is my thought process off from the start? And if I am heading in the right direction is there any obvious reason that I may be expanding the expression incorrectly? Any help would be greatly appreciated.


Sweater Monkey 10-07-2013 09:09 AM

Re: Exercise 3.4
 
Quote:

Originally Posted by magdon (Post 11544)
You got it mostly right. Your error is assuming both term, the H term and the one without the H give an N to cancel the N in the denominator. One term gives an N and the other gives a (d+1).

Yes I realize that only one term should have the N so the issue must be in how I'm expanding the expression.

I think my problem is how I'm looking at the trace of the \mathbf{\epsilon}^T\mathbf{\epsilon} matrix.

I'm under the impression that \mathbf{\epsilon}^T\mathbf{\epsilon} produces an NxN matrix with a diagonal of all \sigma^2 values and 0 elsewhere. I come to this conclusion because the \epsilon are all independent so when multiplied together the covariance of any two should be zero while the covariance of any \epsilon_i\epsilon_i should be the variance of \sigma^2. So then the trace of this matrix should have a sum along the diagonal of N\sigma^2, shouldn't it? :clueless:

aaoam 10-07-2013 09:18 AM

Re: Exercise 3.4
 
I'm having a bit of difficulty with 3.4b. I take \hat(y) - y and multiply by (XX^T)^{-1}XX^T, which ends up reducing the expression to just H\epsilon. However, then I can't use 3.3c in simplifying 3.3c, which makes me think I did something wrong. Can somebody give me a pointer?

Also, it'd be great if there was instructions somewhere about how to post in math mode. Perhaps I just missed them?

magdon 10-07-2013 09:19 AM

Re: Exercise 3.4
 
Yes, that is right. You have to be more careful but use similar reasoning with

\epsilon^TH\epsilon

Quote:

Originally Posted by Sweater Monkey (Post 11545)
Yes I realize that only one term should have the N so the issue must be in how I'm expanding the expression.

I think my problem is how I'm looking at the trace of the \mathbf{\epsilon}^T\mathbf{\epsilon} matrix.

I'm under the impression that \mathbf{\epsilon}^T\mathbf{\epsilon} produces an NxN matrix with a diagonal of all \sigma^2 values and 0 elsewhere. I come to this conclusion because the \epsilon are all independent so when multiplied together the covariance of any two should be zero while the covariance of any \epsilon_i\epsilon_i should be the variance of \sigma^2. So then the trace of this matrix should have a sum along the diagonal of N\sigma^2, shouldn't it? :clueless:


magdon 10-07-2013 09:27 AM

Re: Exercise 3.4
 
\hat y-y is not H\epsilon, but that is close. Recall y=Xw+\epsilon


Quote:

Originally Posted by aaoam (Post 11546)
I'm having a bit of difficulty with 3.4b. I take \hat(y) - y and multiply by (XX^T)^{-1}XX^T, which ends up reducing the expression to just H\epsilon. However, then I can't use 3.3c in simplifying 3.3c, which makes me think I did something wrong. Can somebody give me a pointer?

Also, it'd be great if there was instructions somewhere about how to post in math mode. Perhaps I just missed them?


Sweater Monkey 10-07-2013 07:48 PM

Re: Exercise 3.4
 
Quote:

Originally Posted by magdon (Post 11547)
Yes, that is right. You have to be more careful but use similar reasoning with

\epsilon^TH\epsilon

Ahhhh, yes I see now why \epsilon^TH\epsilon doesn't have a factor of N! The trace of this matrix is just \sigma^2(d+1).

Thanks Professor :)

smiling_assassin 10-07-2013 10:54 PM

Re: Exercise 3.4
 
Quote:

Originally Posted by Sweater Monkey (Post 11551)
Ahhhh, yes I see now why \epsilon^TH\epsilon doesn't have a factor of N! The trace of this matrix is just \sigma^2(d+1).

Thanks Professor :)


But isn't H a N \times N matrix? So trace would be N instead of d+1? I know X is N\times(d+1). What am I missing?

magdon 10-08-2013 07:39 AM

Re: Exercise 3.4
 
You are right, H is an N\times Nmatrix. But its trace is not N. You may consider looking through Exercise 3.3, and in particular, part (d) should be helpful.


Quote:

Originally Posted by smiling_assassin (Post 11553)
But isn't H a N \times N matrix? So trace would be N instead of d+1? I know X is N\times(d+1). What am I missing?


meixingdg 10-09-2013 02:05 PM

Re: Exercise 3.4
 
For part (c), would the result of (y-hat - y) (from part b) be Ein(wlin) in terms of epsilon, since (y-hat - y) is the in-sample error?

magdon 10-10-2013 09:11 AM

Re: Exercise 3.4
 
y and y-hat are vectors. The norm-squared of (y-hat - y) divided by N is the in-sample error.

Quote:

Originally Posted by meixingdg (Post 11559)
For part (c), would the result of (y-hat - y) (from part b) be Ein(wlin) in terms of epsilon, since (y-hat - y) is the in-sample error?


jamesclyeh 11-10-2013 04:27 PM

Re: Exercise 3.4
 
Hi,

For part (a), in one of the last steps I did:
{\bf{y}}=H{\bf{y}}-H\epsilon
Rearrange: \hat{\bf{y}}={\bf{y}}+H\epsilon
Since {\bf{y}}=Xw^*, \hat{\bf{y}}=Xw^*+H\epsilon

Are these steps correct?
I found subbing {\bf{y}}=Xw^* back in a bit recursive because I previously solved for w^* and plugged that in to get {\bf{y}}=H{\bf{y}}-H\epsilon.

Also for (b)
Is the answer \hat{\bf{y}}-{\bf{y}} = (H-I)\epsilon <---I ll delete this once its confirmed.

Thanks,
James

yaser 11-17-2013 03:22 AM

Re: Exercise 3.4
 
Hi James,

I am slow in responding this term as I am attending to the edX forum, but here are my quick comments:

For part (a), why is {\bf{y}}=Xw^* (what happened to the added noise)?

For part (b), your formula is correct.


All times are GMT -7. The time now is 06:54 AM.

Powered by vBulletin® Version 3.8.3
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.
The contents of this forum are to be used ONLY by readers of the Learning From Data book by Yaser S. Abu-Mostafa, Malik Magdon-Ismail, and Hsuan-Tien Lin, and participants in the Learning From Data MOOC by Yaser S. Abu-Mostafa. No part of these contents is to be communicated or made accessible to ANY other person or entity.