LFD Book Forum Is the Hoeffding Inequality really valid for each bin despite non-random sampling?
 User Name Remember Me? Password
 FAQ Calendar Mark Forums Read

 Thread Tools Display Modes
#10
04-13-2013, 02:40 AM
 grozhd Junior Member Join Date: Apr 2013 Posts: 4
Re: Is the Hoeffding Inequality really valid for each bin despite non-random sampling

Quote:
 Originally Posted by yaser It is a subtle point, so let me try to explain it in the terms you outlined. Let us take the sample (what you call , just to follow the book notation). Now evaluate for all hypotheses in your model . We didn't start at one and moved to another. We just evaluated for all . The question is, does Hoeffding inequality apply to each of these 's by itself? The answer is clearly yes since each of them could be in principle the hypothesis you started with (which you called ). Hoeffding states what the probabilities are before the sample is drawn. When you choose one of these hypotheses because of its small , as in the scenario you point out, the probability that applies now is conditioned on the sample having small . We can try to get conditional version of Hoeffding to deal with the situation, or we can try to get a version of Hoeffding that applies regardless of which we choose and how we choose it. The latter is what we did using the union bound. Finally, taking the example you illustrated, any hypothesis you use has to be in (which is decided before the sample is drawn). The one you constructed is not guaranteed to be in . Of course you can guarantee that it is in by taking to be the set of all possible hypotheses, but in this case, is thoroughly infinite and the multiple-bin Hoeffding does not guarantee anything at all.
Thank you for your reply. So, we can think about learning as follows: we have drawn a random sample from the bin and then evaluated all the on it – in this interpretation it is absolutely clear that it is truly random. And then we are choosing the best one and the logic that justifies that is Hoeffding's inequality. The process of PLA, for example, just allows us to search for this "best hypothesis" because hypothesis set is infinite and we can't really do the theoretical process described above.

 Thread Tools Display Modes Threaded Mode

 Posting Rules You may not post new threads You may not post replies You may not post attachments You may not edit your posts BB code is On Smilies are On [IMG] code is On HTML code is Off Forum Rules
 Forum Jump User Control Panel Private Messages Subscriptions Who's Online Search Forums Forums Home General     General Discussion of Machine Learning     Free Additional Material         Dynamic e-Chapters         Dynamic e-Appendices Course Discussions     Online LFD course         General comments on the course         Homework 1         Homework 2         Homework 3         Homework 4         Homework 5         Homework 6         Homework 7         Homework 8         The Final         Create New Homework Problems Book Feedback - Learning From Data     General comments on the book     Chapter 1 - The Learning Problem     Chapter 2 - Training versus Testing     Chapter 3 - The Linear Model     Chapter 4 - Overfitting     Chapter 5 - Three Learning Principles     e-Chapter 6 - Similarity Based Methods     e-Chapter 7 - Neural Networks     e-Chapter 8 - Support Vector Machines     e-Chapter 9 - Learning Aides     Appendix and Notation     e-Appendices

All times are GMT -7. The time now is 03:26 AM.

 Contact Us - LFD Book - Top

Powered by vBulletin® Version 3.8.3
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.
The contents of this forum are to be used ONLY by readers of the Learning From Data book by Yaser S. Abu-Mostafa, Malik Magdon-Ismail, and Hsuan-Tien Lin, and participants in the Learning From Data MOOC by Yaser S. Abu-Mostafa. No part of these contents is to be communicated or made accessible to ANY other person or entity.