Exercises and Problems
Please comment on the chapter problems in terms of difficulty, clarity, and time demands. This information will help us and other instructors in choosing problems to assign in our classes.
Also, please comment on the exercises in terms of how useful they are in understanding the material. 
Exercise 1.10
Since this is a theory week, thought this might be a good time to explore Hoeffding a bit.
I understand c_1 and c_rand satisfy Hoeffding experimentally as described, but conceptually does c_rand satisfy Hoeffding? For example, suppose it is unknown whether each coin is fair (or that they are known to have varying fairness  e.g. c_1 is 50/50, c_2 is 40/60, etc.). Would each coin represent a separate 'bin' or would the random selection of a coin plus the ten flips represent the randomized selection condition for c_rand? Trying to understand if it's necessary for the coins to be identical. 
Re: Exercise 1.10
Quote:

Re: Exercises and Problems
Problem 1.4
WHAT I LIKED: This is an excellent problem, and a model one in my opinion for helping a student to understand material. It starts out easy, with a small data set  that, importantly, is usergenerated. Having to generate a data set  even one as simple as a linearly separable data set in 2 dimensions  goes a long way to helping understand how the Perceptron works and why it wouldn't work if the data were not linearly separable. We gradually progress to bigger data sets, all along the way plotting the data so that we can see what is happening. It is not only instructive but also quite exhilarating to see the results of the PLA (plotting both the target function and the hypothesis generated by the PLA) actually graphed out in front of you on a computer screen! I also thought that the progression to 10 dimensions only after working through the more easily visualized 2 dimensional examples is perfect. Finally, the histogram approach to understanding the computational complexity of the PLA was, I thought, genius. TIME SPENT: Overall I spent about 90 minutes on the problem, although a lot of that was spent documenting my code for future reference, and just general "prettying" of my plots and results; so I would guess that this problem can be completed comfortably in an hour, assuming knowledge of programming in a language where plotting is easy (I used Python, but Matlab/Octave would also be excellent examples for quickanddirty graphics programming of this type.) For someone with no programming experience, the problem would probably take much more time. CLARITY: I think the problem is exceptionally clear. DIFFICULTY: I thought the problem was relatively easy, but the Perceptron is a relatively easy concept so I think making it harder is neither necessary nor appropriate. For metrics purposes, on a 110 scale of difficulty I would give this a 3. 
Re: Exercises and Problems
Quote:

Re: Exercise 1.10
While comparing the CoinFlip experiment with the BinModel
a) Sample size of the individual Bin is equal to the number of coin flips (N=10) b) Number of hypothesis is equal to number of coins (M=1000) but, the aspect of repeating the entire coinflip experiment large number of times (100,000) does not have any counterpart with the (Single or Multiple) BinModel. Is the purpose of repeating the coinflip experiment only to find the estimates of P[vmu>epsilon] without using the hoeffding inequality and then comparing it with the bound given by hoeffding ? Kindly clarify this point. Thanks. 
Problem 1.10 : Expected Off Training Error
Hi,
If I got it right, in a noiseless setting, for a fixed D, if all f are equally likely, the expected offtrainingerror of any hypothesis h is 0.5 (part d of problem 1.10, page 37) and hence any two algorithms are the same in terms of expected off training error (part e of the same problem). My question is, does this not contradict the generalization by Hoeffding. Specifically, the following point is bothering me By Hoeffding : Ein approaches Eout for larger number of hypothesis (i.e for small epsilon) as N grows sufficiently large. Which would imply that expected(Eout) should be approximately the same as expected(Ein) and not a constant (0.5). Can you please provide some insight on this, perhaps my comparison is erroneous. Thanks. 
Re: Problem 1.10 : Expected Off Training Error
Quote:
On face value, the statement "all are equally likely" sounds reasonable. Mathematically, it corresponds to trying to learn a randomly generated target function, and getting the average performance of this learning process. It should not be a surprise that we would get 0.5 under these circumstances, since almost all random functions are impossible to learn. In terms of and , Hoeffding inequality certainly holds for each of these random target functions, but itself will be close to 0.5 for almost all of these functions since they have no pattern to fit, so Hoeffding would indeed predict to be close to 0.5 on average. This is why learning was decomposed into two separate questions in this chapter. In terms of these two questions, the one that "fails" in the random function approach is "?" Let me finally comment that treating "all are equally likely" as a plausible statement is a common trap. This issue is addressed in detail in the context of Bayesian learning in the following video segment: http://work.caltech.edu/library/182.html 
Re: Problem 1.10 : Expected Off Training Error
My takeaway point : All f are equally likely corresponds to trying to learn a randomly generated target function
Thanks for the detailed explanation. The Bayesian Learning example highlights the ramifications of this assumption, very useful point indeed. 
All times are GMT 7. The time now is 11:11 PM. 
Powered by vBulletin® Version 3.8.3
Copyright ©2000  2020, Jelsoft Enterprises Ltd.
The contents of this forum are to be used ONLY by readers of the Learning From Data book by Yaser S. AbuMostafa, Malik MagdonIsmail, and HsuanTien Lin, and participants in the Learning From Data MOOC by Yaser S. AbuMostafa. No part of these contents is to be communicated or made accessible to ANY other person or entity.