LFD Book Forum (http://book.caltech.edu/bookforum/index.php)
-   Homework 8 (http://book.caltech.edu/bookforum/forumdisplay.php?f=137)
-   -   libsvm random seeding (Q7) (http://book.caltech.edu/bookforum/showthread.php?t=4300)

 nparslow 05-20-2013 03:08 AM

libsvm random seeding (Q7)

I notice that the random division of data for libsvm is not so random:

http://www.csie.ntu.edu.tw/~cjlin/libsvm/faq.html#f421

so for me doing 100 runs just gives the identical answer 100 times.

I was thinking that a way around this might be to permute (randomly) the order of the data sample with each new run - thus meaning the same seed will be used within each run (as advised in the link above) but there would be a change between runs.

Does this seem reasonable? Or should I be doing a random seed for each call of svmtrain even within a single run? or am I going in completely the wrong direction?

 Katie C. 05-22-2013 11:39 AM

Re: libsvm random seeding (Q7)

Quote:
 Originally Posted by nparslow (Post 10889) I notice that the random division of data for libsvm is not so random: http://www.csie.ntu.edu.tw/~cjlin/libsvm/faq.html#f421 so for me doing 100 runs just gives the identical answer 100 times. I was thinking that a way around this might be to permute (randomly) the order of the data sample with each new run - thus meaning the same seed will be used within each run (as advised in the link above) but there would be a change between runs. Does this seem reasonable? Or should I be doing a random seed for each call of svmtrain even within a single run? or am I going in completely the wrong direction?
Permuting the order between each run sounds reasonable to me, assuming what you mean by "run" is to try all of the C values.

 Elroch 05-22-2013 11:57 AM

Re: libsvm random seeding (Q7)

Quote:
 Originally Posted by nparslow (Post 10889) I notice that the random division of data for libsvm is not so random: http://www.csie.ntu.edu.tw/~cjlin/libsvm/faq.html#f421 so for me doing 100 runs just gives the identical answer 100 times. I was thinking that a way around this might be to permute (randomly) the order of the data sample with each new run - thus meaning the same seed will be used within each run (as advised in the link above) but there would be a change between runs. Does this seem reasonable? Or should I be doing a random seed for each call of svmtrain even within a single run? or am I going in completely the wrong direction?
If you are doing precisely the same run more than once with the same data with the aim of averaging out noise due to the procedure, then reseeding makes sense to me. The recommendation in the link you gave was not for identical runs: it was saying to fix the seed in order to isolate differences due to changing the parameters. [If you reseed as well as changing parameters this adds noise to the signal due to changing the parameters, so is advised against].

 All times are GMT -7. The time now is 07:58 PM.