LFD Book Forum

LFD Book Forum (http://book.caltech.edu/bookforum/index.php)
-   The Final (http://book.caltech.edu/bookforum/forumdisplay.php?f=138)
-   -   P14-17 Out-Of-Sample Data Set Size? (http://book.caltech.edu/bookforum/showthread.php?t=1513)

munchkin 09-13-2012 11:24 PM

P14-17 Out-Of-Sample Data Set Size?
 
Should the out-of-sample data set also be 100 randomly-generated points or should it be larger like in several of the earlier homeworks? Thanks for your attention.

yaser 09-14-2012 12:13 AM

Re: P15-18 Out-Of-Sample Data Set Size?
 
Quote:

Originally Posted by munchkin (Post 5244)
Should the out-of-sample data set also be 100 randomly-generated points or should it be larger like in several of the earlier homeworks? Thanks for your attention.

Larger will give you a more reliable estimate of E_{\rm out}, and that may be necessary to make sure that the chosen answer is correct.

JohnH 09-14-2012 12:24 AM

Re: P15-18 Out-Of-Sample Data Set Size?
 
The size of the out-of-sample set determines the accuracy and precision of the estimate of E_{out}. I've generally used sets of at least 1000 points.

Andrs 09-14-2012 01:53 AM

Re: P15-18 Out-Of-Sample Data Set Size?
 
In general the answers alternatives have good margins. If you run at least 1000 experiments, you should get "reasonable results" (E_out) on average with at least 100 test points. As always in machine learning, the more test data, the better.

TonySuarez 09-14-2012 04:45 AM

Re: P15-18 Out-Of-Sample Data Set Size?
 
Quote:

Originally Posted by munchkin (Post 5244)
Should the out-of-sample data set also be 100 randomly-generated points or should it be larger like in several of the earlier homeworks? Thanks for your attention.

I settled in 200 points for testing, the same set for all batch of 1000 runs, and Eout seemed very stable.

samirbajaj 09-14-2012 11:01 AM

Re: P15-18 Out-Of-Sample Data Set Size?
 
Quote:

Originally Posted by Andrs (Post 5254)
In general the answers alternatives have good margins. ...


For questions 17 and 18, my answers are different from one set of experiments to the next.

Can't say any more, but I'm wondering if anyone else had a similar experience.

Thanks.

-Samir


All times are GMT -7. The time now is 05:16 AM.

Powered by vBulletin® Version 3.8.3
Copyright ©2000 - 2019, Jelsoft Enterprises Ltd.
The contents of this forum are to be used ONLY by readers of the Learning From Data book by Yaser S. Abu-Mostafa, Malik Magdon-Ismail, and Hsuan-Tien Lin, and participants in the Learning From Data MOOC by Yaser S. Abu-Mostafa. No part of these contents is to be communicated or made accessible to ANY other person or entity.