LFD Book Forum (http://book.caltech.edu/bookforum/index.php)
-   Homework 2 (http://book.caltech.edu/bookforum/forumdisplay.php?f=131)
-   -   HW 2.8: Seeking clarification on simulated noise (http://book.caltech.edu/bookforum/showthread.php?t=351)

 sakumar 04-15-2012 10:37 PM

HW 2.8: Seeking clarification on simulated noise

First we generate a training set of 1000 points. We also generate a vector y using the target function given.

Now we are directed to randomly flip the sign of 10% of the training set.

The training set has 4000 numbers at this point. We should randomly choose 400 of these numbers and flip the sign? Including the y values? Also include the values for the 1000 x0 which we initialized to 1.0?

 jsarrett 04-15-2012 11:42 PM

Re: HW 2.8: Seeking clarification on simulated noise

I'm pretty sure we only flip the sign on the ys. That's what I did and got reasonable results. That corresponds to noise in your sample of the target function.

-James

 yaser 04-16-2012 12:03 AM

Re: HW 2.8: Seeking clarification on simulated noise

Quote:
 Originally Posted by jsarrett (Post 1320) I'm pretty sure we only flip the sign on the ys. That's what I did and got reasonable results. That corresponds to noise in your sample of the target function. -James
You are correct.

 sakumar 04-16-2012 07:18 AM

Re: HW 2.8: Seeking clarification on simulated noise

Thank you both for that clarification. I believe I am inching closer to understanding noise.

I have some follow up questions: How is E_in defined? Do you compare the linear regression results (i.e. sign(w'x) where w is obtained by linear regression using the "noisy" y) to the true value of y or to the the noisy value from the training data?

In the real world, since the target function is unknown, the best one can do is E_in_estimated by comparing sign(w'x) to the "noisy" y. But in this instance we actually do have the target function. So if we are asked to compute E_in should we use the original y?

Edit: I tried both and the closest answer didn't change, but I'd still like to understand the correct definition of E_in.

 htlin 04-16-2012 01:55 PM

Re: HW 2.8: Seeking clarification on simulated noise

Quote:
 Originally Posted by sakumar (Post 1325) Thank you both for that clarification. I believe I am inching closer to understanding noise. I have some follow up questions: How is E_in defined? Do you compare the linear regression results (i.e. sign(w'x) where w is obtained by linear regression using the "noisy" y) to the true value of y or to the the noisy value from the training data? In the real world, since the target function is unknown, the best one can do is E_in_estimated by comparing sign(w'x) to the "noisy" y. But in this instance we actually do have the target function. So if we are asked to compute E_in should we use the original y? Edit: I tried both and the closest answer didn't change, but I'd still like to understand the correct definition of E_in.
You should compare to the noisy y that you have on hand. Hope this helps.

 markweitzman 04-16-2012 06:07 PM

Re: HW 2.8: Seeking clarification on simulated noise

What about with Eout? Do we also compare with noisy y or with y without noise?

 jsarrett 04-16-2012 07:41 PM

Re: HW 2.8: Seeking clarification on simulated noise

Here's how I think about and .

The in-sample performance is how well you have converged on a solution to your given data. in our notation. The given data is a *sample* of the real world domain on which is defined. Therefore the in-sample performance of is how well it works on the data set (how much is ).

The out-of-sample performance , is how well works on the rest of the world (not in our tiny sample). We have seen in several of the problems that we can estimate it by generating a whole new data set (often with many more sample points) and compare the performance of our with the performance of the made up .

Of course in a real situation we won't have , only the knowledge(belief!?) that it exists. That's why we need the Hoeffding inequality, so we can at least bound .

 itooam 07-18-2012 02:33 PM

Re: HW 2.8: Seeking clarification on simulated noise

ttt

Think this thread will help others as it did me. The "flip" comment confused me as well.

 All times are GMT -7. The time now is 11:04 AM.