LFD Book Forum

LFD Book Forum (http://book.caltech.edu/bookforum/index.php)
-   Homework 8 (http://book.caltech.edu/bookforum/forumdisplay.php?f=137)
-   -   libsvm random seeding (Q7) (http://book.caltech.edu/bookforum/showthread.php?t=4300)

nparslow 05-20-2013 04:08 AM

libsvm random seeding (Q7)
 
I notice that the random division of data for libsvm is not so random:

http://www.csie.ntu.edu.tw/~cjlin/libsvm/faq.html#f421

so for me doing 100 runs just gives the identical answer 100 times.

I was thinking that a way around this might be to permute (randomly) the order of the data sample with each new run - thus meaning the same seed will be used within each run (as advised in the link above) but there would be a change between runs.

Does this seem reasonable? Or should I be doing a random seed for each call of svmtrain even within a single run? or am I going in completely the wrong direction?

Katie C. 05-22-2013 12:39 PM

Re: libsvm random seeding (Q7)
 
Quote:

Originally Posted by nparslow (Post 10889)
I notice that the random division of data for libsvm is not so random:

http://www.csie.ntu.edu.tw/~cjlin/libsvm/faq.html#f421

so for me doing 100 runs just gives the identical answer 100 times.

I was thinking that a way around this might be to permute (randomly) the order of the data sample with each new run - thus meaning the same seed will be used within each run (as advised in the link above) but there would be a change between runs.

Does this seem reasonable? Or should I be doing a random seed for each call of svmtrain even within a single run? or am I going in completely the wrong direction?

Permuting the order between each run sounds reasonable to me, assuming what you mean by "run" is to try all of the C values.

Elroch 05-22-2013 12:57 PM

Re: libsvm random seeding (Q7)
 
Quote:

Originally Posted by nparslow (Post 10889)
I notice that the random division of data for libsvm is not so random:

http://www.csie.ntu.edu.tw/~cjlin/libsvm/faq.html#f421

so for me doing 100 runs just gives the identical answer 100 times.

I was thinking that a way around this might be to permute (randomly) the order of the data sample with each new run - thus meaning the same seed will be used within each run (as advised in the link above) but there would be a change between runs.

Does this seem reasonable? Or should I be doing a random seed for each call of svmtrain even within a single run? or am I going in completely the wrong direction?

If you are doing precisely the same run more than once with the same data with the aim of averaging out noise due to the procedure, then reseeding makes sense to me. The recommendation in the link you gave was not for identical runs: it was saying to fix the seed in order to isolate differences due to changing the parameters. [If you reseed as well as changing parameters this adds noise to the signal due to changing the parameters, so is advised against].


All times are GMT -7. The time now is 02:37 PM.

Powered by vBulletin® Version 3.8.3
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.
The contents of this forum are to be used ONLY by readers of the Learning From Data book by Yaser S. Abu-Mostafa, Malik Magdon-Ismail, and Hsuan-Tien Lin, and participants in the Learning From Data MOOC by Yaser S. Abu-Mostafa. No part of these contents is to be communicated or made accessible to ANY other person or entity.