 LFD Book Forum Exercises and Problems
 User Name Remember Me? Password
 FAQ Calendar Mark Forums Read Thread Tools Display Modes
#1 yaser Caltech Join Date: Aug 2009 Location: Pasadena, California, USA Posts: 1,476 Exercises and Problems

Please comment on the chapter problems in terms of difficulty, clarity, and time demands. This information will help us and other instructors in choosing problems to assign in our classes.

Also, please comment on the exercises in terms of how useful they are in understanding the material.
__________________
Where everyone thinks alike, no one thinks very much
#2
 pablo Invited Guest Join Date: Apr 2012 Posts: 2 Exercise 1.10

Since this is a theory week, thought this might be a good time to explore Hoeffding a bit.

I understand c_1 and c_rand satisfy Hoeffding experimentally as described, but conceptually does c_rand satisfy Hoeffding? For example, suppose it is unknown whether each coin is fair (or that they are known to have varying fairness - e.g. c_1 is 50/50, c_2 is 40/60, etc.). Would each coin represent a separate 'bin' or would the random selection of a coin plus the ten flips represent the randomized selection condition for c_rand?

Trying to understand if it's necessary for the coins to be identical.
#3 yaser Caltech Join Date: Aug 2009 Location: Pasadena, California, USA Posts: 1,476 Re: Exercise 1.10

Quote:
 Originally Posted by pablo Since this is a theory week, thought this might be a good time to explore Hoeffding a bit. I understand c_1 and c_rand satisfy Hoeffding experimentally as described, but conceptually does c_rand satisfy Hoeffding? For example, suppose it is unknown whether each coin is fair (or that they are known to have varying fairness - e.g. c_1 is 50/50, c_2 is 40/60, etc.). Would each coin represent a separate 'bin' or would the random selection of a coin plus the ten flips represent the randomized selection condition for c_rand? Trying to understand if it's necessary for the coins to be identical.
Interesting question indeed. Hoeffding does apply to each randomly selected coin individually as you point out. If the coins have different values of , then the added randomization due to the selection of a coin affects the relationship between and . This is exactly the premise of the complete bin model analysis.
__________________
Where everyone thinks alike, no one thinks very much
#4
 tadworthington Member Join Date: Jun 2012 Location: Chicago, IL Posts: 32 Re: Exercises and Problems

Problem 1.4

WHAT I LIKED:
This is an excellent problem, and a model one in my opinion for helping a student to understand material. It starts out easy, with a small data set - that, importantly, is user-generated. Having to generate a data set - even one as simple as a linearly separable data set in 2 dimensions - goes a long way to helping understand how the Perceptron works and why it wouldn't work if the data were not linearly separable. We gradually progress to bigger data sets, all along the way plotting the data so that we can see what is happening. It is not only instructive but also quite exhilarating to see the results of the PLA (plotting both the target function and the hypothesis generated by the PLA) actually graphed out in front of you on a computer screen!

I also thought that the progression to 10 dimensions only after working through the more easily visualized 2 dimensional examples is perfect. Finally, the histogram approach to understanding the computational complexity of the PLA was, I thought, genius.

TIME SPENT:
Overall I spent about 90 minutes on the problem, although a lot of that was spent documenting my code for future reference, and just general "prettying" of my plots and results; so I would guess that this problem can be completed comfortably in an hour, assuming knowledge of programming in a language where plotting is easy (I used Python, but Matlab/Octave would also be excellent examples for quick-and-dirty graphics programming of this type.) For someone with no programming experience, the problem would probably take much more time.

CLARITY:
I think the problem is exceptionally clear.

DIFFICULTY:
I thought the problem was relatively easy, but the Perceptron is a relatively easy concept so I think making it harder is neither necessary nor appropriate. For metrics purposes, on a 1-10 scale of difficulty I would give this a 3.
#5 htlin NTU Join Date: Aug 2009 Location: Taipei, Taiwan Posts: 601 Re: Exercises and Problems

Quote:
 Originally Posted by tadworthington Problem 1.4
Thank you so much for the valuable feedback!
__________________
When one teaches, two learn.
#6
 vsthakur Member Join Date: Jun 2012 Posts: 14 Re: Exercise 1.10

While comparing the Coin-Flip experiment with the Bin-Model
a) Sample size of the individual Bin is equal to the number of coin flips (N=10)
b) Number of hypothesis is equal to number of coins (M=1000)

but, the aspect of repeating the entire coin-flip experiment large number of times (100,000) does not have any counterpart with the (Single or Multiple) Bin-Model.

Is the purpose of repeating the coin-flip experiment only to find the estimates of P[|v-mu|>epsilon] without using the hoeffding inequality and then comparing it with the bound given by hoeffding ? Kindly clarify this point.

Thanks.
#7 yaser Caltech Join Date: Aug 2009 Location: Pasadena, California, USA Posts: 1,476 Re: Exercise 1.10

Quote:
 Originally Posted by vsthakur While comparing the Coin-Flip experiment with the Bin-Model a) Sample size of the individual Bin is equal to the number of coin flips (N=10) b) Number of hypothesis is equal to number of coins (M=1000) but, the aspect of repeating the entire coin-flip experiment large number of times (100,000) does not have any counterpart with the (Single or Multiple) Bin-Model. Is the purpose of repeating the coin-flip experiment only to find the estimates of P[|v-mu|>epsilon] without using the hoeffding inequality and then comparing it with the bound given by hoeffding ? Kindly clarify this point. Thanks.
The purpose is to average out statistical fluctuations. In any given run, we may get unusual 's by coincidence, but by taking the average we can be confident that we captured the typical values.
__________________
Where everyone thinks alike, no one thinks very much
#8
 vsthakur Member Join Date: Jun 2012 Posts: 14 Problem 1.10 : Expected Off Training Error

Hi,
If I got it right, in a noiseless setting, for a fixed D, if all f are equally likely, the expected off-training-error of any hypothesis h is 0.5 (part d of problem 1.10, page 37) and hence any two algorithms are the same in terms of expected off training error (part e of the same problem).

My question is, does this not contradict the generalization by Hoeffding. Specifically, the following point is bothering me

By Hoeffding : Ein approaches Eout for larger number of hypothesis (i.e for small epsilon) as N grows sufficiently large. Which would imply that expected(Eout) should be approximately the same as expected(Ein) and not a constant (0.5).

Can you please provide some insight on this, perhaps my comparison is erroneous.

Thanks.
#9 yaser Caltech Join Date: Aug 2009 Location: Pasadena, California, USA Posts: 1,476 Re: Problem 1.10 : Expected Off Training Error

Quote:
 Originally Posted by vsthakur Hi, If I got it right, in a noiseless setting, for a fixed D, if all f are equally likely, the expected off-training-error of any hypothesis h is 0.5 (part d of problem 1.10, page 37) and hence any two algorithms are the same in terms of expected off training error (part e of the same problem). My question is, does this not contradict the generalization by Hoeffding. Specifically, the following point is bothering me By Hoeffding : Ein approaches Eout for larger number of hypothesis (i.e for small epsilon) as N grows sufficiently large. Which would imply that expected(Eout) should be approximately the same as expected(Ein) and not a constant (0.5). Can you please provide some insight on this, perhaps my comparison is erroneous.
This is an important question, and I thank you for asking it. There is a subtle point that creates this impression of contradiction.

On face value, the statement "all are equally likely" sounds reasonable. Mathematically, it corresponds to trying to learn a randomly generated target function, and getting the average performance of this learning process. It should not be a surprise that we would get 0.5 under these circumstances, since almost all random functions are impossible to learn.

In terms of and , Hoeffding inequality certainly holds for each of these random target functions, but itself will be close to 0.5 for almost all of these functions since they have no pattern to fit, so Hoeffding would indeed predict to be close to 0.5 on average.

This is why learning was decomposed into two separate questions in this chapter. In terms of these two questions, the one that "fails" in the random function approach is " ?"

Let me finally comment that treating "all are equally likely" as a plausible statement is a common trap. This issue is addressed in detail in the context of Bayesian learning in the following video segment:

http://work.caltech.edu/library/182.html
__________________
Where everyone thinks alike, no one thinks very much
#10
 vsthakur Member Join Date: Jun 2012 Posts: 14 Re: Problem 1.10 : Expected Off Training Error

My takeaway point : All f are equally likely corresponds to trying to learn a randomly generated target function

Thanks for the detailed explanation. The Bayesian Learning example highlights the ramifications of this assumption, very useful point indeed. Tags hoeffding's inequality, hoeffding-inequality Thread Tools Show Printable Version Email this Page Display Modes Linear Mode Switch to Hybrid Mode Switch to Threaded Mode Posting Rules You may not post new threads You may not post replies You may not post attachments You may not edit your posts BB code is On Smilies are On [IMG] code is On HTML code is Off Forum Rules
 Forum Jump User Control Panel Private Messages Subscriptions Who's Online Search Forums Forums Home General     General Discussion of Machine Learning     Free Additional Material         Dynamic e-Chapters         Dynamic e-Appendices Course Discussions     Online LFD course         General comments on the course         Homework 1         Homework 2         Homework 3         Homework 4         Homework 5         Homework 6         Homework 7         Homework 8         The Final         Create New Homework Problems Book Feedback - Learning From Data     General comments on the book     Chapter 1 - The Learning Problem     Chapter 2 - Training versus Testing     Chapter 3 - The Linear Model     Chapter 4 - Overfitting     Chapter 5 - Three Learning Principles     e-Chapter 6 - Similarity Based Methods     e-Chapter 7 - Neural Networks     e-Chapter 8 - Support Vector Machines     e-Chapter 9 - Learning Aides     Appendix and Notation     e-Appendices

All times are GMT -7. The time now is 12:40 PM.

 Contact Us - LFD Book - Top The contents of this forum are to be used ONLY by readers of the Learning From Data book by Yaser S. Abu-Mostafa, Malik Magdon-Ismail, and Hsuan-Tien Lin, and participants in the Learning From Data MOOC by Yaser S. Abu-Mostafa. No part of these contents is to be communicated or made accessible to ANY other person or entity.