LFD Book Forum Cross validation parameter selection

#1
02-27-2013, 05:58 PM
 hemphill
Cross validation parameter selection

This question is inspired by Problem 7 on the homework, and the phrase "and base our answer on the number of runs that lead to a particular choice."

Wouldn't the correct procedure be to try to minimize ? Suppose I get the following results:

-- ----- --------
15 0.17
45 0.12
30 0.10
10 0.13

Wouldn't be the correct parameter choice?
#2
02-27-2013, 06:37 PM
 yaser
Re: Cross validation parameter selection

Quote:
 Originally Posted by hemphill This question is inspired by Problem 7 on the homework, and the phrase "and base our answer on the number of runs that lead to a particular choice." Wouldn't the correct procedure be to try to minimize ? Suppose I get the following results: C votes -- ----- -------- 15 0.17 45 0.12 30 0.10 10 0.13 Wouldn't be the correct parameter choice?
Your procedure makes sense. In both procedures, we are trying to even out the variations that result from random partitions to get a standard answer. In a practical situation, you just use one partition, so the other procedure is trying to find the most likely answer in that practical situation.
#3
03-02-2013, 12:42 AM
 gah44
Re: Cross validation parameter selection

So far I have been using the libsvm svm_train program.

For cross validation I use -v 10.

The question suggests that we can use the same cross for different C, but, as far as I know, it seeds the RNG different each time. (If it doesn't, multiple runs won't help.)

More obvious to me is to average Ecv over the 100 runs for each C.

Or maybe I am missing how svm_train -v 10 works.
#4
03-02-2013, 08:58 AM
 hemphill
Re: Cross validation parameter selection

Quote:
 Originally Posted by gah44 So far I have been using the libsvm svm_train program. For cross validation I use -v 10. The question suggests that we can use the same cross for different C, but, as far as I know, it seeds the RNG different each time. (If it doesn't, multiple runs won't help.) More obvious to me is to average Ecv over the 100 runs for each C. Or maybe I am missing how svm_train -v 10 works.
I don't believe that libsvm svm_train uses a random number generator at all. After trying all the C values, you then shuffle the data and repeat. As I suggested, you can get a different answer if you take the average , as opposed to the one which wins most often.
#5
03-02-2013, 12:20 PM
 gah44
Re: Cross validation parameter selection

Quote:
 Originally Posted by hemphill I don't believe that libsvm svm_train uses a random number generator at all. After trying all the C values, you then shuffle the data and repeat. As I suggested, you can get a different answer if you take the average , as opposed to the one which wins most often.
Well, I wrote that one while it was still running, so I didn't yet know what it would do. It does seem to generate different Ecv running it again with the same input, so there seems to be some randomness.

But the numbers I get don't have an obvious minimum, so it seems that isn't a good way to find the answer to the problem.
#6
03-02-2013, 04:54 PM
 gah44
Re: Cross validation parameter selection

OK, I now generated the results the right way, with a surprise.

The one that most often has the lowest C, or ties for lowest,
is the one with the highest average Ecv.

I thought I knew from the previous tests what the answer to problem 8 would be, but no.

Seems that often there is a tie and so a certain C wins, but in other cases the winning C has a large enough Ein to get the average (mean) way off.

Reminds me why statisticians like median instead of mean.

