LFD Book Forum  

Go Back   LFD Book Forum > Course Discussions > Online LFD course > Homework 5

Reply
 
Thread Tools Display Modes
  #1  
Old 05-05-2013, 05:55 PM
marek marek is offline
Member
 
Join Date: Apr 2013
Posts: 31
Default Hw5 Q8 E_out

I am struggling to understand how to calculate E_{out} in this question. I have two competing theories, which I will describe below. Any help is greatly appreciated.

Once the algorithm terminates, I have w^{(t)}. I now generate a new set of data points \{X_i\}_{i=1}^M. Using my original target function to generate the corresponding Y_i = f(X_i).

Case 1. Just use the same cross entropy error calculation but on this new data set.

E_{out} = \frac{1}{M} \sum_{i=1}^M \ln (1+e^{-Y_i w^\top X_i})

Case 2. Directly calculate the expected output of our hypothesis function and compare to Y_i.

g(X_i) = +1 with probability \theta (w^\top X_i) = \frac{1}{1+e^{-w^\top X_i}}

Ultimately this gives us the probability that our hypothesis aligns with Y:

P(Y_i | X_i) = \theta(Y_i w^\top X_i)

In the lectures/book, we would multiply these probabilities to get the "likelihood" that the data was generated by this hypothesis. However, it seems that averaging over these should give the expected error in this sample.

E_{out} = \frac{1}{M} \sum_{i=1}^{M} (1-P(Y_i | X_i))

It feels as though the first approach is the correct one, but I struggle because the second approach makes intuitive sense since that is how I historically I would have calculated E_{out}. To make matters worse, the two approaches very closely approximate different answers in the question!
Reply With Quote
  #2  
Old 05-05-2013, 09:07 PM
yaser's Avatar
yaser yaser is offline
Caltech
 
Join Date: Aug 2009
Location: Pasadena, California, USA
Posts: 1,477
Default Re: Hw5 Q8 E_out

Quote:
Originally Posted by marek View Post
I am struggling to understand how to calculate E_{out} in this question. I have two competing theories, which I will describe below. Any help is greatly appreciated.

Once the algorithm terminates, I have w^{(t)}. I now generate a new set of data points \{X_i\}_{i=1}^M. Using my original target function to generate the corresponding Y_i = f(X_i).

Case 1. Just use the same cross entropy error calculation but on this new data set.

E_{out} = \frac{1}{M} \sum_{i=1}^M \ln (1+e^{-Y_i w^\top X_i})
The above approach is correct. The problem specifies the cross entropy error measure, so E_{\rm out} = {\rm E} [ \ln (1+e^{-y{\bf w}^\top {\bf x}})], where the expectation is w.r.t. both {\bf x},y. The above formula estimates that through a random sample.
__________________
Where everyone thinks alike, no one thinks very much
Reply With Quote
  #3  
Old 05-05-2013, 09:12 PM
marek marek is offline
Member
 
Join Date: Apr 2013
Posts: 31
Default Re: Hw5 Q8 E_out

Quote:
Originally Posted by yaser View Post
The above approach is correct. The problem specifies the cross entropy error measure, so E_{\rm out} = {\rm E} [ \ln (1+e^{-y{\bf w}^\top {\bf x}})], where the expectation is w.r.t. both {\bf x},y. The above formula estimates that through a random sample.
I suspected as much. I'll try to figure out why my other approach is wrong tomorrow. I think I've burned out on it today and am probably not seeing something obvious. Thanks for your help!
Reply With Quote
  #4  
Old 05-06-2013, 02:37 AM
arcticblue arcticblue is offline
Member
 
Join Date: Apr 2013
Posts: 17
Default Re: Hw5 Q8 E_out

I am also a little unsure about exactly how this equation works:
E_{out} = \frac{1}{M} \sum_{i=1}^M \ln (1+e^{-Y_i w^\top X_i})

Obviously the more negative {-Y_i w^\top X_i} is the closer E_out is to zero which is good. So is w supposed to be normalized? I presume so because otherwise I could just scale w and then E_out becomes very small. And if it is normalized then the values I'm getting for E_in and E_out are both much greater than any of the options. (Maybe it's meant to be like that, if so it's quite unnerving.)
Reply With Quote
  #5  
Old 05-06-2013, 10:14 AM
yaser's Avatar
yaser yaser is offline
Caltech
 
Join Date: Aug 2009
Location: Pasadena, California, USA
Posts: 1,477
Default Re: Hw5 Q8 E_out

Quote:
Originally Posted by arcticblue View Post
I am also a little unsure about exactly how this equation works:
E_{out} = \frac{1}{M} \sum_{i=1}^M \ln (1+e^{-Y_i w^\top X_i})

Obviously the more negative {-Y_i w^\top X_i} is the closer E_out is to zero which is good. So is w supposed to be normalized? I presume so because otherwise I could just scale w and then E_out becomes very small. And if it is normalized then the values I'm getting for E_in and E_out are both much greater than any of the options. (Maybe it's meant to be like that, if so it's quite unnerving.)
No normalization. The value of {\bf w} is determined iteratively by the specific algorithm given in the lecture. If {\bf w} 'agrees' with all the training examples, then indeed the algorithm will try to scale it up to get the value of the logistic function closer to a hard threshold. When you evaluate the quoted formula on a test set, {\bf w} is frozen and no scaling or any other change in it is allowed.
__________________
Where everyone thinks alike, no one thinks very much
Reply With Quote
  #6  
Old 05-06-2013, 02:39 PM
Michael Reach Michael Reach is offline
Senior Member
 
Join Date: Apr 2013
Location: Baltimore, Maryland, USA
Posts: 71
Default Re: Hw5 Q8 E_out

Quote:
If 'agrees' with all the training examples, then indeed the algorithm will try to scale it up to get the value of the logistic function closer to a hard threshold.
Thank you for this comment - I finally have some idea what I'm seeing in the homework. This point is worth stressing: the scale of w determines the sharpness of the threshold.
Reply With Quote
  #7  
Old 05-06-2013, 04:22 PM
arcticblue arcticblue is offline
Member
 
Join Date: Apr 2013
Posts: 17
Default Re: Hw5 Q8 E_out

Thank you for explaining that normalization is not required. Your explanation now makes a bit more sense why I see the weights continue to increase in value the more iterations that I run.
Reply With Quote
  #8  
Old 07-16-2013, 05:27 AM
hsolo hsolo is offline
Member
 
Join Date: Jul 2013
Posts: 12
Default Re: Hw5 Q8 E_out

One very minor point which may help error prone folks such as myself:
In Linear methods we always have a d+1 dimensional weight vector by adding an extra pseudo-coordinate of 1 for each training and test point. I completely overlooked this and was struggling with the error never getting as small as the expected answer. Spent a long time rechecking code etc.

As soon as I fixed this, things fell into place. Likely I will never forget this :-)
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT -7. The time now is 12:47 AM.


Powered by vBulletin® Version 3.8.3
Copyright ©2000 - 2019, Jelsoft Enterprises Ltd.
The contents of this forum are to be used ONLY by readers of the Learning From Data book by Yaser S. Abu-Mostafa, Malik Magdon-Ismail, and Hsuan-Tien Lin, and participants in the Learning From Data MOOC by Yaser S. Abu-Mostafa. No part of these contents is to be communicated or made accessible to ANY other person or entity.