LFD Book Forum  

Go Back   LFD Book Forum > Course Discussions > Online LFD course > Homework 2

Reply
 
Thread Tools Display Modes
  #1  
Old 04-15-2012, 11:37 PM
sakumar sakumar is offline
Member
 
Join Date: Apr 2012
Posts: 40
Default HW 2.8: Seeking clarification on simulated noise

First we generate a training set of 1000 points. We also generate a vector y using the target function given.

Now we are directed to randomly flip the sign of 10% of the training set.

The training set has 4000 numbers at this point. We should randomly choose 400 of these numbers and flip the sign? Including the y values? Also include the values for the 1000 x0 which we initialized to 1.0?
Reply With Quote
  #2  
Old 04-16-2012, 12:42 AM
jsarrett jsarrett is offline
Member
 
Join Date: Apr 2012
Location: Sunland, CA
Posts: 13
Default Re: HW 2.8: Seeking clarification on simulated noise

I'm pretty sure we only flip the sign on the ys. That's what I did and got reasonable results. That corresponds to noise in your sample of the target function.

-James
Reply With Quote
  #3  
Old 04-16-2012, 01:03 AM
yaser's Avatar
yaser yaser is offline
Caltech
 
Join Date: Aug 2009
Location: Pasadena, California, USA
Posts: 1,477
Default Re: HW 2.8: Seeking clarification on simulated noise

Quote:
Originally Posted by jsarrett View Post
I'm pretty sure we only flip the sign on the ys. That's what I did and got reasonable results. That corresponds to noise in your sample of the target function.

-James
You are correct.
__________________
Where everyone thinks alike, no one thinks very much
Reply With Quote
  #4  
Old 04-16-2012, 08:18 AM
sakumar sakumar is offline
Member
 
Join Date: Apr 2012
Posts: 40
Default Re: HW 2.8: Seeking clarification on simulated noise

Thank you both for that clarification. I believe I am inching closer to understanding noise.

I have some follow up questions: How is E_in defined? Do you compare the linear regression results (i.e. sign(w'x) where w is obtained by linear regression using the "noisy" y) to the true value of y or to the the noisy value from the training data?

In the real world, since the target function is unknown, the best one can do is E_in_estimated by comparing sign(w'x) to the "noisy" y. But in this instance we actually do have the target function. So if we are asked to compute E_in should we use the original y?

Edit: I tried both and the closest answer didn't change, but I'd still like to understand the correct definition of E_in.
Reply With Quote
  #5  
Old 04-16-2012, 02:55 PM
htlin's Avatar
htlin htlin is offline
NTU
 
Join Date: Aug 2009
Location: Taipei, Taiwan
Posts: 601
Default Re: HW 2.8: Seeking clarification on simulated noise

Quote:
Originally Posted by sakumar View Post
Thank you both for that clarification. I believe I am inching closer to understanding noise.

I have some follow up questions: How is E_in defined? Do you compare the linear regression results (i.e. sign(w'x) where w is obtained by linear regression using the "noisy" y) to the true value of y or to the the noisy value from the training data?

In the real world, since the target function is unknown, the best one can do is E_in_estimated by comparing sign(w'x) to the "noisy" y. But in this instance we actually do have the target function. So if we are asked to compute E_in should we use the original y?

Edit: I tried both and the closest answer didn't change, but I'd still like to understand the correct definition of E_in.
You should compare to the noisy y that you have on hand. Hope this helps.
__________________
When one teaches, two learn.
Reply With Quote
  #6  
Old 04-16-2012, 07:07 PM
markweitzman markweitzman is offline
Invited Guest
 
Join Date: Apr 2012
Location: Las Vegas
Posts: 69
Default Re: HW 2.8: Seeking clarification on simulated noise

What about with Eout? Do we also compare with noisy y or with y without noise?
Reply With Quote
  #7  
Old 04-16-2012, 08:41 PM
jsarrett jsarrett is offline
Member
 
Join Date: Apr 2012
Location: Sunland, CA
Posts: 13
Default Re: HW 2.8: Seeking clarification on simulated noise

Here's how I think about E_{in} and E_{out}.

The in-sample performance E_{in} is how well you have converged on a solution to your given data. X in our notation. The given data is a *sample* of the real world domain on which f is defined. Therefore the in-sample performance of h is how well it works on the data set (how much is h(x) \approx y).

The out-of-sample performance E_{out}, is how well h works on the rest of the world (not in our tiny sample). We have seen in several of the problems that we can estimate it by generating a whole new data set (often with many more sample points) and compare the performance of our h with the performance of the made up f.

Of course in a real situation we won't have f, only the knowledge(belief!?) that it exists. That's why we need the Hoeffding inequality, so we can at least bound E_{out}.
Reply With Quote
  #8  
Old 07-18-2012, 03:33 PM
itooam itooam is offline
Senior Member
 
Join Date: Jul 2012
Posts: 100
Default Re: HW 2.8: Seeking clarification on simulated noise

ttt

Think this thread will help others as it did me. The "flip" comment confused me as well.
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT -7. The time now is 01:48 AM.


Powered by vBulletin® Version 3.8.3
Copyright ©2000 - 2019, Jelsoft Enterprises Ltd.
The contents of this forum are to be used ONLY by readers of the Learning From Data book by Yaser S. Abu-Mostafa, Malik Magdon-Ismail, and Hsuan-Tien Lin, and participants in the Learning From Data MOOC by Yaser S. Abu-Mostafa. No part of these contents is to be communicated or made accessible to ANY other person or entity.