LFD Book Forum (http://book.caltech.edu/bookforum/index.php)
-   Homework 5 (http://book.caltech.edu/bookforum/forumdisplay.php?f=134)
-   -   Questions on Lecture 9 (Linear Models II) (http://book.caltech.edu/bookforum/showthread.php?t=970)

 hashable 08-09-2012 04:45 AM

Questions on Lecture 9 (Linear Models II)

1. In the example in the lecture, we were cautioned against data snooping since looking at data can mean that we can be implicitly doing some learning in our head. My question is: Is it legitimate to look at DataSet 1 to identify my predictors, and then train on DataSet 2 with samples entirely different from DataSet 1? Of course, the out of sample error will be evaluated on DataSet 3 different from 1 and 2.

2. At the end of the lecture, somebody asked a question about multiclass classifiers and it was answered that it is commonly done using either one-vs-all training or one-vs-one training. My questions:
• 2-a) For the one-versus-all, we need to only build 'n' classifiers for n-classes. Whereas for one-versus-one, we have to build n-choose-two classifiers which can take much longer if we have many classes. Are there any inherent benefits to one-vs-one? If not, why do it at all since one-vs-all is faster to train?
• 2-b) Are there any reasons why one method is preferable over another? E.g Is there impact on accuracy/generalization by choosing either approach?

3. We used cross entropy error for logistic and squared error for linear. It was explained that the choice of error is so that the math becomes easy with respect to implementation of the minimization. In both cases, the practical interpretation was explained and it appears intuitive. My questions:
• 3-a) Does the choice of error-measure affect the final choice of approximation? In other words, will we get a different g depending on whether we use linear or squared or any other error function? (Ignore the complexity of the math with respect to minimization for now.)
• 3-b)If we optimize to find g using one error function, but evaluate using a different error function, will the evaluation be meaningful? E.g. Use squared error to evaluate out of sample performance for a logistic model built by minimizing cross entropy error.

 yaser 08-09-2012 05:19 AM

Re: Questions on Lecture 9 (Linear Models II)

Quote:
 Originally Posted by hashable (Post 3920) 1. In the example in the lecture, we were cautioned against data snooping since looking at data can mean that we can be implicitly doing some learning in our head. My question is: Is it legitimate to look at DataSet 1 to identify my predictors, and then train on DataSet 2 with samples entirely different from DataSet 1? Of course, the out of sample error will be evaluated on DataSet 3 different from 1 and 2.
Yes, this is legitimate.

Quote:
 2. At the end of the lecture, somebody asked a question about multiclass classifiers and it was answered that it is commonly done using either one-vs-all training or one-vs-one training. My questions: 2-a) For the one-versus-all, we need to only build 'n' classifiers for n-classes. Whereas for one-versus-one, we have to build n-choose-two classifiers which can take much longer if we have many classes. Are there any inherent benefits to one-vs-one? If not, why do it at all since one-vs-all is faster to train? 2-b) Are there any reasons why one method is preferable over another? E.g Is there impact on accuracy/generalization by choosing either approach?
There is a significant body of work on multiclass in machine learning that you can explore in the open literature, and considerations of generalization and computation are key issues as you mentioned. The answer in the lecture addressed one-versus-one and one-versus-all because of their conceptual simplicity.

Quote:
 3. We used cross entropy error for logistic and squared error for linear. It was explained that the choice of error is so that the math becomes easy with respect to implementation of the minimization. In both cases, the practical interpretation was explained and it appears intuitive. My questions: 3-a) Does the choice of error-measure affect the final choice of approximation? In other words, will we get a different g depending on whether we use linear or squared or any other error function? (Ignore the complexity of the math with respect to minimization for now.) 3-b)If we optimize to find g using one error function, but evaluate using a different error function, will the evaluation be meaningful? E.g. Use squared error to evaluate out of sample performance for a logistic model built by minimizing cross entropy error.
The choice of error measure does affect the final hypothesis, and you can certainly evaluate different error measures on the same hypothesis. It is meaningful in the sense that it does measure the error in a particular way, but it may be hard to interpret the errors when they come from different measures.

 gah44 08-09-2012 10:26 PM

Re: Questions on Lecture 9 (Linear Models II)

Quote:
 Originally Posted by hashable (Post 3920) (snip) 3. We used cross entropy error for logistic and squared error for linear. It was explained that the choice of error is so that the math becomes easy with respect to implementation of the minimization. In both cases, the practical interpretation was explained and it appears intuitive. My questions: 3-a) Does the choice of error-measure affect the final choice of approximation? In other words, will we get a different g depending on whether we use linear or squared or any other error function? (Ignore the complexity of the math with respect to minimization for now.) 3-b)If we optimize to find g using one error function, but evaluate using a different error function, will the evaluation be meaningful? E.g. Use squared error to evaluate out of sample performance for a logistic model built by minimizing cross entropy error.

Statisticians don't like squared error much. It seems that minimizing the sum of absolute values of differences, instead of the square, gives better results, but the math is harder. Least squares is too sensitive to one outlier, for example.

 All times are GMT -7. The time now is 02:28 PM.