LFD Book Forum  

Go Back   LFD Book Forum > Course Discussions > Online LFD course > Homework 5

Reply
 
Thread Tools Display Modes
  #1  
Old 08-09-2012, 04:45 AM
hashable hashable is offline
Junior Member
 
Join Date: Jul 2012
Posts: 8
Default Questions on Lecture 9 (Linear Models II)

1. In the example in the lecture, we were cautioned against data snooping since looking at data can mean that we can be implicitly doing some learning in our head. My question is: Is it legitimate to look at DataSet 1 to identify my predictors, and then train on DataSet 2 with samples entirely different from DataSet 1? Of course, the out of sample error will be evaluated on DataSet 3 different from 1 and 2.

2. At the end of the lecture, somebody asked a question about multiclass classifiers and it was answered that it is commonly done using either one-vs-all training or one-vs-one training. My questions:
  • 2-a) For the one-versus-all, we need to only build 'n' classifiers for n-classes. Whereas for one-versus-one, we have to build n-choose-two classifiers which can take much longer if we have many classes. Are there any inherent benefits to one-vs-one? If not, why do it at all since one-vs-all is faster to train?
  • 2-b) Are there any reasons why one method is preferable over another? E.g Is there impact on accuracy/generalization by choosing either approach?

3. We used cross entropy error for logistic and squared error for linear. It was explained that the choice of error is so that the math becomes easy with respect to implementation of the minimization. In both cases, the practical interpretation was explained and it appears intuitive. My questions:
  • 3-a) Does the choice of error-measure affect the final choice of approximation? In other words, will we get a different g depending on whether we use linear or squared or any other error function? (Ignore the complexity of the math with respect to minimization for now.)
  • 3-b)If we optimize to find g using one error function, but evaluate using a different error function, will the evaluation be meaningful? E.g. Use squared error to evaluate out of sample performance for a logistic model built by minimizing cross entropy error.
Reply With Quote
  #2  
Old 08-09-2012, 05:19 AM
yaser's Avatar
yaser yaser is offline
Caltech
 
Join Date: Aug 2009
Location: Pasadena, California, USA
Posts: 1,477
Default Re: Questions on Lecture 9 (Linear Models II)

Quote:
Originally Posted by hashable View Post
1. In the example in the lecture, we were cautioned against data snooping since looking at data can mean that we can be implicitly doing some learning in our head. My question is: Is it legitimate to look at DataSet 1 to identify my predictors, and then train on DataSet 2 with samples entirely different from DataSet 1? Of course, the out of sample error will be evaluated on DataSet 3 different from 1 and 2.
Yes, this is legitimate.

Quote:
2. At the end of the lecture, somebody asked a question about multiclass classifiers and it was answered that it is commonly done using either one-vs-all training or one-vs-one training. My questions:
  • 2-a) For the one-versus-all, we need to only build 'n' classifiers for n-classes. Whereas for one-versus-one, we have to build n-choose-two classifiers which can take much longer if we have many classes. Are there any inherent benefits to one-vs-one? If not, why do it at all since one-vs-all is faster to train?
  • 2-b) Are there any reasons why one method is preferable over another? E.g Is there impact on accuracy/generalization by choosing either approach?
There is a significant body of work on multiclass in machine learning that you can explore in the open literature, and considerations of generalization and computation are key issues as you mentioned. The answer in the lecture addressed one-versus-one and one-versus-all because of their conceptual simplicity.

Quote:
3. We used cross entropy error for logistic and squared error for linear. It was explained that the choice of error is so that the math becomes easy with respect to implementation of the minimization. In both cases, the practical interpretation was explained and it appears intuitive. My questions:
  • 3-a) Does the choice of error-measure affect the final choice of approximation? In other words, will we get a different g depending on whether we use linear or squared or any other error function? (Ignore the complexity of the math with respect to minimization for now.)
  • 3-b)If we optimize to find g using one error function, but evaluate using a different error function, will the evaluation be meaningful? E.g. Use squared error to evaluate out of sample performance for a logistic model built by minimizing cross entropy error.
The choice of error measure does affect the final hypothesis, and you can certainly evaluate different error measures on the same hypothesis. It is meaningful in the sense that it does measure the error in a particular way, but it may be hard to interpret the errors when they come from different measures.
__________________
Where everyone thinks alike, no one thinks very much
Reply With Quote
  #3  
Old 08-09-2012, 10:26 PM
gah44 gah44 is offline
Invited Guest
 
Join Date: Jul 2012
Location: Seattle, WA
Posts: 153
Default Re: Questions on Lecture 9 (Linear Models II)

Quote:
Originally Posted by hashable View Post
(snip)

3. We used cross entropy error for logistic and squared error for linear. It was explained that the choice of error is so that the math becomes easy with respect to implementation of the minimization. In both cases, the practical interpretation was explained and it appears intuitive. My questions:
  • 3-a) Does the choice of error-measure affect the final choice of approximation? In other words, will we get a different g depending on whether we use linear or squared or any other error function? (Ignore the complexity of the math with respect to minimization for now.)
  • 3-b)If we optimize to find g using one error function, but evaluate using a different error function, will the evaluation be meaningful? E.g. Use squared error to evaluate out of sample performance for a logistic model built by minimizing cross entropy error.

Statisticians don't like squared error much. It seems that minimizing the sum of absolute values of differences, instead of the square, gives better results, but the math is harder. Least squares is too sensitive to one outlier, for example.
Reply With Quote
Reply

Tags
data snooping, error, generalization error, multi-class classifiers

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT -7. The time now is 06:37 AM.


Powered by vBulletin® Version 3.8.3
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.
The contents of this forum are to be used ONLY by readers of the Learning From Data book by Yaser S. Abu-Mostafa, Malik Magdon-Ismail, and Hsuan-Tien Lin, and participants in the Learning From Data MOOC by Yaser S. Abu-Mostafa. No part of these contents is to be communicated or made accessible to ANY other person or entity.