LFD Book Forum  

Go Back   LFD Book Forum > Course Discussions > Online LFD course > Homework 7

Reply
 
Thread Tools Display Modes
  #1  
Old 08-27-2012, 07:34 AM
itooam itooam is offline
Senior Member
 
Join Date: Jul 2012
Posts: 100
Default Training v Testing set size rules of thumb...

...

I have read elsewhere comments like "for learning it is best to use say 40% of your whole dataset for training, 30% for validation and say 30% for testing". In light of cross-validation using "leave one/many out" technique, is there a rule of thumb for training vs test set size proportions?
Would I be correct in answering as follows: the test set should be larger than the minimum indicated by VC... effectively 10*degrees of freedom (using other rule of thumb)?...maybe after this it is just trial and error as to what to apportion to the test set with the remaining data?
Reply With Quote
  #2  
Old 08-27-2012, 08:53 AM
htlin's Avatar
htlin htlin is offline
NTU
 
Join Date: Aug 2009
Location: Taipei, Taiwan
Posts: 601
Default Re: Training v Testing set size rules of thumb...

The most common number of folds used for cross validation is 3 to 10, and more than 20 is really rare. For single-shot validation, I've seen 5% up to 40% reserved for validation. Hope this helps.
__________________
When one teaches, two learn.
Reply With Quote
  #3  
Old 08-28-2012, 02:42 AM
itooam itooam is offline
Senior Member
 
Join Date: Jul 2012
Posts: 100
Default Re: Training v Testing set size rules of thumb...

If I have understood correctly, once you start using cross validation model you only need to partition your data into 2 (as opposed to a training/validation/testing set model i.e., you partition your data into 3 sets). One set to be used for both training and cross validation, the other set for testing. The "test" set being the set you lock away and don't look at until you are decided on the best hypothesis to use i.e., to see how well the model generalises to independent data. I was wondering what % you should allocate to each of these two sets?

When you wrote:
For single-shot validation, I've seen 5% up to 40% reserved for validation

I assume your meaning of "validation" set is synonymous with "test" set since cross validation is already in place?
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT -7. The time now is 01:19 AM.


Powered by vBulletin® Version 3.8.3
Copyright ©2000 - 2019, Jelsoft Enterprises Ltd.
The contents of this forum are to be used ONLY by readers of the Learning From Data book by Yaser S. Abu-Mostafa, Malik Magdon-Ismail, and Hsuan-Tien Lin, and participants in the Learning From Data MOOC by Yaser S. Abu-Mostafa. No part of these contents is to be communicated or made accessible to ANY other person or entity.