LFD Book Forum Multiple validation sets
 User Name Remember Me? Password
 Register FAQ Calendar Mark Forums Read

 Thread Tools Display Modes
#1
07-01-2016, 02:01 PM
 kartikeya_t@yahoo.com Member Join Date: Jul 2012 Posts: 17
Multiple validation sets

Hello everybody,

I was listening to Prof. Mostafa’s lecture on validation, and at one point he mentioned that we can have more than one validation set (if our data permits) to make multiple choices on different models and/or parameters. That got me thinking about how one may go about doing that.

Say we have just enough data to carve out a training set and two validation sets. We have two decisions to make, one is which order of polynomial to fit the data with (2nd order, 3rd order, or 5th order), and the other is what value of the regularization parameter to use (lambda equals 0.01, 0.1, 1 or 2). What is the best way of doing this? I can think of the following ways, and would appreciate any feedback on which approach, if any, might be the best.

1. We have a total of 12 combinations of choices. So we work out the 12 candidate hypotheses on the training set, and combine the 2 validation sets into one set to choose the best hypothesis with. This may “contaminate” the validation set a bit more than we want, as the number of hypotheses is not small. We then combine the training and the validation set to produce the final hypothesis with the best polynomial order and the best lambda.

2. We use the training data to first produce 3 hypotheses based on polynomial order, with a fixed value of lambda chosen from among its four possible values. We decide the best polynomial order using the first validation set. Then we use the training set again with the just decided polynomial order to produce 4 hypotheses based on the values of lambda. We use the second validation set to decide the best value of lambda. Finally, we combine all the data to come up with the best hypothesis with the chosen polynomial order and the chose lambda value.

3. We do the same as option 2, but reverse the order of deciding - find lambda first, then the polynomial order.

4. We do the same as options 2 or 3, but after the first decision, we combine the training set and the first validation set to produce a second, bigger training set for the second decision.

These are the four options I can think of. In option 2, I am not sure if randomly choosing a value of lambda while we are deciding the best polynomial order is wise. Also not sure if using the same training set again and again is OK (wouldn't it get horribly contaminated?)

Any thoughts or comments are much appreciated.

 Tags generalization, validation

 Thread Tools Display Modes Threaded Mode

 Posting Rules You may not post new threads You may not post replies You may not post attachments You may not edit your posts BB code is On Smilies are On [IMG] code is On HTML code is Off Forum Rules
 Forum Jump User Control Panel Private Messages Subscriptions Who's Online Search Forums Forums Home General     General Discussion of Machine Learning     Free Additional Material         Dynamic e-Chapters         Dynamic e-Appendices Course Discussions     Online LFD course         General comments on the course         Homework 1         Homework 2         Homework 3         Homework 4         Homework 5         Homework 6         Homework 7         Homework 8         The Final         Create New Homework Problems Book Feedback - Learning From Data     General comments on the book     Chapter 1 - The Learning Problem     Chapter 2 - Training versus Testing     Chapter 3 - The Linear Model     Chapter 4 - Overfitting     Chapter 5 - Three Learning Principles     e-Chapter 6 - Similarity Based Methods     e-Chapter 7 - Neural Networks     e-Chapter 8 - Support Vector Machines     e-Chapter 9 - Learning Aides     Appendix and Notation     e-Appendices

All times are GMT -7. The time now is 02:44 PM.

 Contact Us - LFD Book - Top

Powered by vBulletin® Version 3.8.3
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.
The contents of this forum are to be used ONLY by readers of the Learning From Data book by Yaser S. Abu-Mostafa, Malik Magdon-Ismail, and Hsuan-Tien Lin, and participants in the Learning From Data MOOC by Yaser S. Abu-Mostafa. No part of these contents is to be communicated or made accessible to ANY other person or entity.