LFD Book Forum  

Go Back   LFD Book Forum > Course Discussions > Online LFD course > Homework 8

Reply
 
Thread Tools Display Modes
  #1  
Old 02-27-2013, 04:58 PM
hemphill hemphill is offline
Member
 
Join Date: Jan 2013
Posts: 18
Default Cross validation parameter selection

This question is inspired by Problem 7 on the homework, and the phrase "and base our answer on the number of runs that lead to a particular choice."

Wouldn't the correct procedure be to try to minimize {\bf E}[E_{\rm cv}]? Suppose I get the following results:

C votes {\bf E}[E_{\rm cv}]
-- ----- --------
C_1 15 0.17
C_2 45 0.12
C_3 30 0.10
C_4 10 0.13

Wouldn't C_3 be the correct parameter choice?
Reply With Quote
  #2  
Old 02-27-2013, 05:37 PM
yaser's Avatar
yaser yaser is offline
Caltech
 
Join Date: Aug 2009
Location: Pasadena, California, USA
Posts: 1,477
Default Re: Cross validation parameter selection

Quote:
Originally Posted by hemphill View Post
This question is inspired by Problem 7 on the homework, and the phrase "and base our answer on the number of runs that lead to a particular choice."

Wouldn't the correct procedure be to try to minimize {\bf E}[E_{\rm cv}]? Suppose I get the following results:

C votes {\bf E}[E_{\rm cv}]
-- ----- --------
C_1 15 0.17
C_2 45 0.12
C_3 30 0.10
C_4 10 0.13

Wouldn't C_3 be the correct parameter choice?
Your procedure makes sense. In both procedures, we are trying to even out the variations that result from random partitions to get a standard answer. In a practical situation, you just use one partition, so the other procedure is trying to find the most likely answer in that practical situation.
__________________
Where everyone thinks alike, no one thinks very much
Reply With Quote
  #3  
Old 03-01-2013, 11:42 PM
gah44 gah44 is offline
Invited Guest
 
Join Date: Jul 2012
Location: Seattle, WA
Posts: 153
Default Re: Cross validation parameter selection

So far I have been using the libsvm svm_train program.

For cross validation I use -v 10.

The question suggests that we can use the same cross for different C, but, as far as I know, it seeds the RNG different each time. (If it doesn't, multiple runs won't help.)

More obvious to me is to average Ecv over the 100 runs for each C.

Or maybe I am missing how svm_train -v 10 works.
Reply With Quote
  #4  
Old 03-02-2013, 07:58 AM
hemphill hemphill is offline
Member
 
Join Date: Jan 2013
Posts: 18
Default Re: Cross validation parameter selection

Quote:
Originally Posted by gah44 View Post
So far I have been using the libsvm svm_train program.

For cross validation I use -v 10.

The question suggests that we can use the same cross for different C, but, as far as I know, it seeds the RNG different each time. (If it doesn't, multiple runs won't help.)

More obvious to me is to average Ecv over the 100 runs for each C.

Or maybe I am missing how svm_train -v 10 works.
I don't believe that libsvm svm_train uses a random number generator at all. After trying all the C values, you then shuffle the data and repeat. As I suggested, you can get a different answer if you take the average E_{cv}, as opposed to the one which wins most often.
Reply With Quote
  #5  
Old 03-02-2013, 11:20 AM
gah44 gah44 is offline
Invited Guest
 
Join Date: Jul 2012
Location: Seattle, WA
Posts: 153
Default Re: Cross validation parameter selection

Quote:
Originally Posted by hemphill View Post
I don't believe that libsvm svm_train uses a random number generator at all. After trying all the C values, you then shuffle the data and repeat. As I suggested, you can get a different answer if you take the average E_{cv}, as opposed to the one which wins most often.
Well, I wrote that one while it was still running, so I didn't yet know what it would do. It does seem to generate different Ecv running it again with the same input, so there seems to be some randomness.

But the numbers I get don't have an obvious minimum, so it seems that isn't a good way to find the answer to the problem.
Reply With Quote
  #6  
Old 03-02-2013, 03:54 PM
gah44 gah44 is offline
Invited Guest
 
Join Date: Jul 2012
Location: Seattle, WA
Posts: 153
Default Re: Cross validation parameter selection

OK, I now generated the results the right way, with a surprise.

The one that most often has the lowest C, or ties for lowest,
is the one with the highest average Ecv.

I thought I knew from the previous tests what the answer to problem 8 would be, but no.

Seems that often there is a tie and so a certain C wins, but in other cases the winning C has a large enough Ein to get the average (mean) way off.

Reminds me why statisticians like median instead of mean.
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT -7. The time now is 07:35 PM.


Powered by vBulletin® Version 3.8.3
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.
The contents of this forum are to be used ONLY by readers of the Learning From Data book by Yaser S. Abu-Mostafa, Malik Magdon-Ismail, and Hsuan-Tien Lin, and participants in the Learning From Data MOOC by Yaser S. Abu-Mostafa. No part of these contents is to be communicated or made accessible to ANY other person or entity.