#1




HW 2.8: Seeking clarification on simulated noise
First we generate a training set of 1000 points. We also generate a vector y using the target function given.
Now we are directed to randomly flip the sign of 10% of the training set. The training set has 4000 numbers at this point. We should randomly choose 400 of these numbers and flip the sign? Including the y values? Also include the values for the 1000 x0 which we initialized to 1.0? 
#2




Re: HW 2.8: Seeking clarification on simulated noise
I'm pretty sure we only flip the sign on the ys. That's what I did and got reasonable results. That corresponds to noise in your sample of the target function.
James 
#3




Re: HW 2.8: Seeking clarification on simulated noise
You are correct.
__________________
Where everyone thinks alike, no one thinks very much 
#4




Re: HW 2.8: Seeking clarification on simulated noise
Thank you both for that clarification. I believe I am inching closer to understanding noise.
I have some follow up questions: How is E_in defined? Do you compare the linear regression results (i.e. sign(w'x) where w is obtained by linear regression using the "noisy" y) to the true value of y or to the the noisy value from the training data? In the real world, since the target function is unknown, the best one can do is E_in_estimated by comparing sign(w'x) to the "noisy" y. But in this instance we actually do have the target function. So if we are asked to compute E_in should we use the original y? Edit: I tried both and the closest answer didn't change, but I'd still like to understand the correct definition of E_in. 
#5




Re: HW 2.8: Seeking clarification on simulated noise
Quote:
__________________
When one teaches, two learn. 
#6




Re: HW 2.8: Seeking clarification on simulated noise
What about with Eout? Do we also compare with noisy y or with y without noise?

#7




Re: HW 2.8: Seeking clarification on simulated noise
Here's how I think about and .
The insample performance is how well you have converged on a solution to your given data. in our notation. The given data is a *sample* of the real world domain on which is defined. Therefore the insample performance of is how well it works on the data set (how much is ). The outofsample performance , is how well works on the rest of the world (not in our tiny sample). We have seen in several of the problems that we can estimate it by generating a whole new data set (often with many more sample points) and compare the performance of our with the performance of the made up . Of course in a real situation we won't have , only the knowledge(belief!?) that it exists. That's why we need the Hoeffding inequality, so we can at least bound . 
#8




Re: HW 2.8: Seeking clarification on simulated noise
ttt
Think this thread will help others as it did me. The "flip" comment confused me as well. 
Thread Tools  
Display Modes  

