LFD Book Forum HW 2.8: Seeking clarification on simulated noise
 User Name Remember Me? Password
 Register FAQ Calendar Mark Forums Read

 Thread Tools Display Modes
#1
04-15-2012, 11:37 PM
 sakumar Member Join Date: Apr 2012 Posts: 40
HW 2.8: Seeking clarification on simulated noise

First we generate a training set of 1000 points. We also generate a vector y using the target function given.

Now we are directed to randomly flip the sign of 10% of the training set.

The training set has 4000 numbers at this point. We should randomly choose 400 of these numbers and flip the sign? Including the y values? Also include the values for the 1000 x0 which we initialized to 1.0?
#2
04-16-2012, 12:42 AM
 jsarrett Member Join Date: Apr 2012 Location: Sunland, CA Posts: 13
Re: HW 2.8: Seeking clarification on simulated noise

I'm pretty sure we only flip the sign on the ys. That's what I did and got reasonable results. That corresponds to noise in your sample of the target function.

-James
#3
04-16-2012, 01:03 AM
 yaser Caltech Join Date: Aug 2009 Location: Pasadena, California, USA Posts: 1,478
Re: HW 2.8: Seeking clarification on simulated noise

Quote:
 Originally Posted by jsarrett I'm pretty sure we only flip the sign on the ys. That's what I did and got reasonable results. That corresponds to noise in your sample of the target function. -James
You are correct.
__________________
Where everyone thinks alike, no one thinks very much
#4
04-16-2012, 08:18 AM
 sakumar Member Join Date: Apr 2012 Posts: 40
Re: HW 2.8: Seeking clarification on simulated noise

Thank you both for that clarification. I believe I am inching closer to understanding noise.

I have some follow up questions: How is E_in defined? Do you compare the linear regression results (i.e. sign(w'x) where w is obtained by linear regression using the "noisy" y) to the true value of y or to the the noisy value from the training data?

In the real world, since the target function is unknown, the best one can do is E_in_estimated by comparing sign(w'x) to the "noisy" y. But in this instance we actually do have the target function. So if we are asked to compute E_in should we use the original y?

Edit: I tried both and the closest answer didn't change, but I'd still like to understand the correct definition of E_in.
#5
04-16-2012, 02:55 PM
 htlin NTU Join Date: Aug 2009 Location: Taipei, Taiwan Posts: 601
Re: HW 2.8: Seeking clarification on simulated noise

Quote:
 Originally Posted by sakumar Thank you both for that clarification. I believe I am inching closer to understanding noise. I have some follow up questions: How is E_in defined? Do you compare the linear regression results (i.e. sign(w'x) where w is obtained by linear regression using the "noisy" y) to the true value of y or to the the noisy value from the training data? In the real world, since the target function is unknown, the best one can do is E_in_estimated by comparing sign(w'x) to the "noisy" y. But in this instance we actually do have the target function. So if we are asked to compute E_in should we use the original y? Edit: I tried both and the closest answer didn't change, but I'd still like to understand the correct definition of E_in.
You should compare to the noisy y that you have on hand. Hope this helps.
__________________
When one teaches, two learn.
#6
04-16-2012, 07:07 PM
 markweitzman Invited Guest Join Date: Apr 2012 Location: Las Vegas Posts: 69
Re: HW 2.8: Seeking clarification on simulated noise

What about with Eout? Do we also compare with noisy y or with y without noise?
#7
04-16-2012, 08:41 PM
 jsarrett Member Join Date: Apr 2012 Location: Sunland, CA Posts: 13
Re: HW 2.8: Seeking clarification on simulated noise

Here's how I think about and .

The in-sample performance is how well you have converged on a solution to your given data. in our notation. The given data is a *sample* of the real world domain on which is defined. Therefore the in-sample performance of is how well it works on the data set (how much is ).

The out-of-sample performance , is how well works on the rest of the world (not in our tiny sample). We have seen in several of the problems that we can estimate it by generating a whole new data set (often with many more sample points) and compare the performance of our with the performance of the made up .

Of course in a real situation we won't have , only the knowledge(belief!?) that it exists. That's why we need the Hoeffding inequality, so we can at least bound .
#8
07-18-2012, 03:33 PM
 itooam Senior Member Join Date: Jul 2012 Posts: 100
Re: HW 2.8: Seeking clarification on simulated noise

ttt

Think this thread will help others as it did me. The "flip" comment confused me as well.

 Thread Tools Display Modes Linear Mode

 Posting Rules You may not post new threads You may not post replies You may not post attachments You may not edit your posts BB code is On Smilies are On [IMG] code is On HTML code is Off Forum Rules
 Forum Jump User Control Panel Private Messages Subscriptions Who's Online Search Forums Forums Home General     General Discussion of Machine Learning     Free Additional Material         Dynamic e-Chapters         Dynamic e-Appendices Course Discussions     Online LFD course         General comments on the course         Homework 1         Homework 2         Homework 3         Homework 4         Homework 5         Homework 6         Homework 7         Homework 8         The Final         Create New Homework Problems Book Feedback - Learning From Data     General comments on the book     Chapter 1 - The Learning Problem     Chapter 2 - Training versus Testing     Chapter 3 - The Linear Model     Chapter 4 - Overfitting     Chapter 5 - Three Learning Principles     e-Chapter 6 - Similarity Based Methods     e-Chapter 7 - Neural Networks     e-Chapter 8 - Support Vector Machines     e-Chapter 9 - Learning Aides     Appendix and Notation     e-Appendices

All times are GMT -7. The time now is 12:23 AM.

 Contact Us - LFD Book - Top