LFD Book Forum Cross validation parameter selection

#1
02-27-2013, 05:58 PM
 hemphill Member Join Date: Jan 2013 Posts: 18
Cross validation parameter selection

This question is inspired by Problem 7 on the homework, and the phrase "and base our answer on the number of runs that lead to a particular choice."

Wouldn't the correct procedure be to try to minimize ? Suppose I get the following results:

-- ----- --------
15 0.17
45 0.12
30 0.10
10 0.13

Wouldn't be the correct parameter choice?
#2
02-27-2013, 06:37 PM
 yaser Caltech Join Date: Aug 2009 Location: Pasadena, California, USA Posts: 1,477
Re: Cross validation parameter selection

Quote:
 Originally Posted by hemphill This question is inspired by Problem 7 on the homework, and the phrase "and base our answer on the number of runs that lead to a particular choice." Wouldn't the correct procedure be to try to minimize ? Suppose I get the following results: C votes -- ----- -------- 15 0.17 45 0.12 30 0.10 10 0.13 Wouldn't be the correct parameter choice?
Your procedure makes sense. In both procedures, we are trying to even out the variations that result from random partitions to get a standard answer. In a practical situation, you just use one partition, so the other procedure is trying to find the most likely answer in that practical situation.
__________________
Where everyone thinks alike, no one thinks very much
#3
03-02-2013, 12:42 AM
 gah44 Invited Guest Join Date: Jul 2012 Location: Seattle, WA Posts: 153
Re: Cross validation parameter selection

So far I have been using the libsvm svm_train program.

For cross validation I use -v 10.

The question suggests that we can use the same cross for different C, but, as far as I know, it seeds the RNG different each time. (If it doesn't, multiple runs won't help.)

More obvious to me is to average Ecv over the 100 runs for each C.

Or maybe I am missing how svm_train -v 10 works.
#4
03-02-2013, 08:58 AM
 hemphill Member Join Date: Jan 2013 Posts: 18
Re: Cross validation parameter selection

Quote:
 Originally Posted by gah44 So far I have been using the libsvm svm_train program. For cross validation I use -v 10. The question suggests that we can use the same cross for different C, but, as far as I know, it seeds the RNG different each time. (If it doesn't, multiple runs won't help.) More obvious to me is to average Ecv over the 100 runs for each C. Or maybe I am missing how svm_train -v 10 works.
I don't believe that libsvm svm_train uses a random number generator at all. After trying all the C values, you then shuffle the data and repeat. As I suggested, you can get a different answer if you take the average , as opposed to the one which wins most often.
#5
03-02-2013, 12:20 PM
 gah44 Invited Guest Join Date: Jul 2012 Location: Seattle, WA Posts: 153
Re: Cross validation parameter selection

Quote:
 Originally Posted by hemphill I don't believe that libsvm svm_train uses a random number generator at all. After trying all the C values, you then shuffle the data and repeat. As I suggested, you can get a different answer if you take the average , as opposed to the one which wins most often.
Well, I wrote that one while it was still running, so I didn't yet know what it would do. It does seem to generate different Ecv running it again with the same input, so there seems to be some randomness.

But the numbers I get don't have an obvious minimum, so it seems that isn't a good way to find the answer to the problem.
#6
03-02-2013, 04:54 PM
 gah44 Invited Guest Join Date: Jul 2012 Location: Seattle, WA Posts: 153
Re: Cross validation parameter selection

OK, I now generated the results the right way, with a surprise.

The one that most often has the lowest C, or ties for lowest,
is the one with the highest average Ecv.

I thought I knew from the previous tests what the answer to problem 8 would be, but no.

Seems that often there is a tie and so a certain C wins, but in other cases the winning C has a large enough Ein to get the average (mean) way off.

Reminds me why statisticians like median instead of mean.

 Thread Tools Display Modes Linear Mode

 Posting Rules You may not post new threads You may not post replies You may not post attachments You may not edit your posts BB code is On Smilies are On [IMG] code is On HTML code is Off Forum Rules
 Forum Jump User Control Panel Private Messages Subscriptions Who's Online Search Forums Forums Home General     General Discussion of Machine Learning     Free Additional Material         Dynamic e-Chapters         Dynamic e-Appendices Course Discussions     Online LFD course         General comments on the course         Homework 1         Homework 2         Homework 3         Homework 4         Homework 5         Homework 6         Homework 7         Homework 8         The Final         Create New Homework Problems Book Feedback - Learning From Data     General comments on the book     Chapter 1 - The Learning Problem     Chapter 2 - Training versus Testing     Chapter 3 - The Linear Model     Chapter 4 - Overfitting     Chapter 5 - Three Learning Principles     e-Chapter 6 - Similarity Based Methods     e-Chapter 7 - Neural Networks     e-Chapter 8 - Support Vector Machines     e-Chapter 9 - Learning Aides     Appendix and Notation     e-Appendices

All times are GMT -7. The time now is 04:05 AM.