LFD Book Forum

LFD Book Forum (http://book.caltech.edu/bookforum/index.php)
-   Homework 2 (http://book.caltech.edu/bookforum/forumdisplay.php?f=131)
-   -   HW 2.8: Seeking clarification on simulated noise (http://book.caltech.edu/bookforum/showthread.php?t=351)

sakumar 04-15-2012 11:37 PM

HW 2.8: Seeking clarification on simulated noise
 
First we generate a training set of 1000 points. We also generate a vector y using the target function given.

Now we are directed to randomly flip the sign of 10% of the training set.

The training set has 4000 numbers at this point. We should randomly choose 400 of these numbers and flip the sign? Including the y values? Also include the values for the 1000 x0 which we initialized to 1.0?

jsarrett 04-16-2012 12:42 AM

Re: HW 2.8: Seeking clarification on simulated noise
 
I'm pretty sure we only flip the sign on the ys. That's what I did and got reasonable results. That corresponds to noise in your sample of the target function.

-James

yaser 04-16-2012 01:03 AM

Re: HW 2.8: Seeking clarification on simulated noise
 
Quote:

Originally Posted by jsarrett (Post 1320)
I'm pretty sure we only flip the sign on the ys. That's what I did and got reasonable results. That corresponds to noise in your sample of the target function.

-James

You are correct.

sakumar 04-16-2012 08:18 AM

Re: HW 2.8: Seeking clarification on simulated noise
 
Thank you both for that clarification. I believe I am inching closer to understanding noise.

I have some follow up questions: How is E_in defined? Do you compare the linear regression results (i.e. sign(w'x) where w is obtained by linear regression using the "noisy" y) to the true value of y or to the the noisy value from the training data?

In the real world, since the target function is unknown, the best one can do is E_in_estimated by comparing sign(w'x) to the "noisy" y. But in this instance we actually do have the target function. So if we are asked to compute E_in should we use the original y?

Edit: I tried both and the closest answer didn't change, but I'd still like to understand the correct definition of E_in.

htlin 04-16-2012 02:55 PM

Re: HW 2.8: Seeking clarification on simulated noise
 
Quote:

Originally Posted by sakumar (Post 1325)
Thank you both for that clarification. I believe I am inching closer to understanding noise.

I have some follow up questions: How is E_in defined? Do you compare the linear regression results (i.e. sign(w'x) where w is obtained by linear regression using the "noisy" y) to the true value of y or to the the noisy value from the training data?

In the real world, since the target function is unknown, the best one can do is E_in_estimated by comparing sign(w'x) to the "noisy" y. But in this instance we actually do have the target function. So if we are asked to compute E_in should we use the original y?

Edit: I tried both and the closest answer didn't change, but I'd still like to understand the correct definition of E_in.

You should compare to the noisy y that you have on hand. Hope this helps.

markweitzman 04-16-2012 07:07 PM

Re: HW 2.8: Seeking clarification on simulated noise
 
What about with Eout? Do we also compare with noisy y or with y without noise?

jsarrett 04-16-2012 08:41 PM

Re: HW 2.8: Seeking clarification on simulated noise
 
Here's how I think about E_{in} and E_{out}.

The in-sample performance E_{in} is how well you have converged on a solution to your given data. X in our notation. The given data is a *sample* of the real world domain on which f is defined. Therefore the in-sample performance of h is how well it works on the data set (how much is h(x) \approx y).

The out-of-sample performance E_{out}, is how well h works on the rest of the world (not in our tiny sample). We have seen in several of the problems that we can estimate it by generating a whole new data set (often with many more sample points) and compare the performance of our h with the performance of the made up f.

Of course in a real situation we won't have f, only the knowledge(belief!?) that it exists. That's why we need the Hoeffding inequality, so we can at least bound E_{out}.

itooam 07-18-2012 03:33 PM

Re: HW 2.8: Seeking clarification on simulated noise
 
ttt

Think this thread will help others as it did me. The "flip" comment confused me as well.


All times are GMT -7. The time now is 06:25 AM.

Powered by vBulletin® Version 3.8.3
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.
The contents of this forum are to be used ONLY by readers of the Learning From Data book by Yaser S. Abu-Mostafa, Malik Magdon-Ismail, and Hsuan-Tien Lin, and participants in the Learning From Data MOOC by Yaser S. Abu-Mostafa. No part of these contents is to be communicated or made accessible to ANY other person or entity.