LFD Book Forum Ch. 1.3.1 vs. Ch. 1.3.2
 User Name Remember Me? Password
 Register FAQ Calendar Mark Forums Read

 Thread Tools Display Modes
#1
04-04-2013, 11:30 AM
 Michael Reach Senior Member Join Date: Apr 2013 Location: Baltimore, Maryland, USA Posts: 71
Ch. 1.3.1 vs. Ch. 1.3.2

The book shows that in the case given in the truth table in 1.3.1 (not in the lectures), any hypothesis you choose from the first five points is no better than a random guess on the last three data points.

Whereas, it says in 1.3.2 that we can have a good probability that a hypothesis is close, using the Hoeffding inequality.

The book doesn't work that out for the earlier example given, so I thought I'd try to understand it. But let me make the case a little more extreme:

Let's say that our space of results comes from flipping a coin 100 times. Our "sample" data is the first 97 flips, N=97. We would like to develop a hypothesis that will do a good job on the last three flips.

Now, that isn't going to work; as the book points out for its example, no hypothesis is better than any other on the last three random flips. The average probability of error will be 0.5?

So here we are with our (somehow) carefully chosen hypothesis, which does a perfect job on the first 97 flips.

Pick : not so stringent. Doesn't the Hoeffding inequality tell us that the probability of our in-sample error (=0) being different from the out-of-sample error (~0.5 on average) by more than 0.4 is less than ? Which is real small, or so.

---------------------------------------

Where did I go wrong? My guess (I'd appreciate someone with the correct answer):

I said "(somehow) carefully chosen hypothesis". What I was supposed to do was choose it randomly. If I had, I would have almost always gotten a hypothesis that does a terrible job both on the sample and on the remaining three. Probably it would get about 50% on each, and Hoeffding would be right.

At the other extreme, I could have gone through some complex procedure for fitting a 97th degree polynomial through my sample, and thereby "memorized" the data, getting a hypothesis that does a great job on the sample. When I did that, I was effectively pawing through all the possible such polynomials (capital H) looking for the ones that do best. The professor called that M - the number of possible hypotheses. I guess for this job I'd need something like 97 parameters, each of which should have (at least) two possible values, so M is something like . Multiply that into Hoeffding's inequality and the rhs isn't small any more.

Anyhow, I'd appreciate it if someone has a clearer response.

 Thread Tools Display Modes Threaded Mode

 Posting Rules You may not post new threads You may not post replies You may not post attachments You may not edit your posts BB code is On Smilies are On [IMG] code is On HTML code is Off Forum Rules
 Forum Jump User Control Panel Private Messages Subscriptions Who's Online Search Forums Forums Home General     General Discussion of Machine Learning     Free Additional Material         Dynamic e-Chapters         Dynamic e-Appendices Course Discussions     Online LFD course         General comments on the course         Homework 1         Homework 2         Homework 3         Homework 4         Homework 5         Homework 6         Homework 7         Homework 8         The Final         Create New Homework Problems Book Feedback - Learning From Data     General comments on the book     Chapter 1 - The Learning Problem     Chapter 2 - Training versus Testing     Chapter 3 - The Linear Model     Chapter 4 - Overfitting     Chapter 5 - Three Learning Principles     e-Chapter 6 - Similarity Based Methods     e-Chapter 7 - Neural Networks     e-Chapter 8 - Support Vector Machines     e-Chapter 9 - Learning Aides     Appendix and Notation     e-Appendices

All times are GMT -7. The time now is 08:32 PM.

 Contact Us - LFD Book - Top

Powered by vBulletin® Version 3.8.3
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.
The contents of this forum are to be used ONLY by readers of the Learning From Data book by Yaser S. Abu-Mostafa, Malik Magdon-Ismail, and Hsuan-Tien Lin, and participants in the Learning From Data MOOC by Yaser S. Abu-Mostafa. No part of these contents is to be communicated or made accessible to ANY other person or entity.