LFD Book Forum

LFD Book Forum (http://book.caltech.edu/bookforum/index.php)
-   Homework 5 (http://book.caltech.edu/bookforum/forumdisplay.php?f=134)
-   -   Question 1 (http://book.caltech.edu/bookforum/showthread.php?t=4273)

Humble 05-07-2013 08:23 AM

Question 1
 
What does the outside expected value E[E(Wlin)] value mean in words.

yaser 05-07-2013 09:29 AM

Re: Question 1
 
Quote:

Originally Posted by Humble (Post 10741)
What does the outside expected value E[E(Wlin)] value mean in words.

First, just to make sure, the inside 'E' is not an expectation, but the value of the in-sample error that corresponds to the weight vector {\bf w}_{\rm lin}. The (outside) expected value is with respect to the training data set, and it means the average value (of the in-sample error) as you train with different data sets.

hsolo 07-15-2013 05:30 AM

Re: Question 1
 
Quote:

Originally Posted by yaser (Post 10743)
First, just to make sure, the inside 'E' is not an expectation, but the value of the in-sample error that corresponds to the weight vector {\bf w}_{\rm lin}. The (outside) expected value is with respect to the training data set, and it means the average value (of the in-sample error) as you train with different data sets.


Training data has d dimensions in the x's. If one ignored some of the dimensions and did linear regression with reduced number d' of dimensions one would have larger in-sample errors presumably, compared to considering all d dimensions?

Why then is the expected in-sample error averaged over all data sets increasing with the number of dimensions?

yaser 07-15-2013 12:38 PM

Re: Question 1
 
Quote:

Originally Posted by hsolo (Post 11263)
Training data has d dimensions in the x's. If one ignored some of the dimensions and did linear regression with reduced number d' of dimensions one would have larger in-sample errors presumably, compared to considering all d dimensions?

Why then is the expected in-sample error averaged over all data sets increasing with the number of dimensions?

To answer the first question, if you choose to omit some of the input variables, you will indeed get a larger (at least not smaller) in-sample error. Not sure I understand the second question, but having different training sets does not change the number of input variables. It is a hypothetical situation where you assume the availability of different data sets on the same variables.

hsolo 07-15-2013 09:35 PM

Re: Question 1
 
Quote:

Originally Posted by yaser (Post 11266)
To answer the first question, if you choose to omit some of the input variables, you will indeed get a larger (at least not smaller) in-sample error. Not sure I understand the second question, but having different training sets does not change the number of input variables. It is a hypothetical situation where you assume the availability of different data sets on the same variables.

My bad for the second question -- I had a typo in my handwritten expression for the expectation. The correct expression does have expected in-sample error decreasing as d is increasing.


All times are GMT -7. The time now is 10:46 AM.

Powered by vBulletin® Version 3.8.3
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.
The contents of this forum are to be used ONLY by readers of the Learning From Data book by Yaser S. Abu-Mostafa, Malik Magdon-Ismail, and Hsuan-Tien Lin, and participants in the Learning From Data MOOC by Yaser S. Abu-Mostafa. No part of these contents is to be communicated or made accessible to ANY other person or entity.