LFD Book Forum

LFD Book Forum (http://book.caltech.edu/bookforum/index.php)
-   Chapter 1 - The Learning Problem (http://book.caltech.edu/bookforum/forumdisplay.php?f=108)
-   -   Exercises and Problems (http://book.caltech.edu/bookforum/showthread.php?t=257)

yaser 03-24-2012 11:24 PM

Exercises and Problems
 
Please comment on the chapter problems in terms of difficulty, clarity, and time demands. This information will help us and other instructors in choosing problems to assign in our classes.

Also, please comment on the exercises in terms of how useful they are in understanding the material.

pablo 04-21-2012 09:20 PM

Exercise 1.10
 
Since this is a theory week, thought this might be a good time to explore Hoeffding a bit.

I understand c_1 and c_rand satisfy Hoeffding experimentally as described, but conceptually does c_rand satisfy Hoeffding? For example, suppose it is unknown whether each coin is fair (or that they are known to have varying fairness - e.g. c_1 is 50/50, c_2 is 40/60, etc.). Would each coin represent a separate 'bin' or would the random selection of a coin plus the ten flips represent the randomized selection condition for c_rand?

Trying to understand if it's necessary for the coins to be identical.

yaser 04-21-2012 10:32 PM

Re: Exercise 1.10
 
Quote:

Originally Posted by pablo (Post 1523)
Since this is a theory week, thought this might be a good time to explore Hoeffding a bit.

I understand c_1 and c_rand satisfy Hoeffding experimentally as described, but conceptually does c_rand satisfy Hoeffding? For example, suppose it is unknown whether each coin is fair (or that they are known to have varying fairness - e.g. c_1 is 50/50, c_2 is 40/60, etc.). Would each coin represent a separate 'bin' or would the random selection of a coin plus the ten flips represent the randomized selection condition for c_rand?

Trying to understand if it's necessary for the coins to be identical.

Interesting question indeed. Hoeffding does apply to each randomly selected coin individually as you point out. If the coins have different values of E_{\rm out}, then the added randomization due to the selection of a coin affects the relationship between E_{\rm in} and E_{\rm out}. This is exactly the premise of the complete bin model analysis.

tadworthington 07-02-2012 02:13 PM

Re: Exercises and Problems
 
Problem 1.4

WHAT I LIKED:
This is an excellent problem, and a model one in my opinion for helping a student to understand material. It starts out easy, with a small data set - that, importantly, is user-generated. Having to generate a data set - even one as simple as a linearly separable data set in 2 dimensions - goes a long way to helping understand how the Perceptron works and why it wouldn't work if the data were not linearly separable. We gradually progress to bigger data sets, all along the way plotting the data so that we can see what is happening. It is not only instructive but also quite exhilarating to see the results of the PLA (plotting both the target function and the hypothesis generated by the PLA) actually graphed out in front of you on a computer screen!

I also thought that the progression to 10 dimensions only after working through the more easily visualized 2 dimensional examples is perfect. Finally, the histogram approach to understanding the computational complexity of the PLA was, I thought, genius.

TIME SPENT:
Overall I spent about 90 minutes on the problem, although a lot of that was spent documenting my code for future reference, and just general "prettying" of my plots and results; so I would guess that this problem can be completed comfortably in an hour, assuming knowledge of programming in a language where plotting is easy (I used Python, but Matlab/Octave would also be excellent examples for quick-and-dirty graphics programming of this type.) For someone with no programming experience, the problem would probably take much more time.

CLARITY:
I think the problem is exceptionally clear.

DIFFICULTY:
I thought the problem was relatively easy, but the Perceptron is a relatively easy concept so I think making it harder is neither necessary nor appropriate. For metrics purposes, on a 1-10 scale of difficulty I would give this a 3.

htlin 07-04-2012 02:16 PM

Re: Exercises and Problems
 
Quote:

Originally Posted by tadworthington (Post 3335)
Problem 1.4

Thank you so much for the valuable feedback!

vsthakur 07-27-2012 07:38 PM

Re: Exercise 1.10
 
While comparing the Coin-Flip experiment with the Bin-Model
a) Sample size of the individual Bin is equal to the number of coin flips (N=10)
b) Number of hypothesis is equal to number of coins (M=1000)

but, the aspect of repeating the entire coin-flip experiment large number of times (100,000) does not have any counterpart with the (Single or Multiple) Bin-Model.

Is the purpose of repeating the coin-flip experiment only to find the estimates of P[|v-mu|>epsilon] without using the hoeffding inequality and then comparing it with the bound given by hoeffding ? Kindly clarify this point.

Thanks.

yaser 07-27-2012 08:30 PM

Re: Exercise 1.10
 
Quote:

Originally Posted by vsthakur (Post 3717)
While comparing the Coin-Flip experiment with the Bin-Model
a) Sample size of the individual Bin is equal to the number of coin flips (N=10)
b) Number of hypothesis is equal to number of coins (M=1000)

but, the aspect of repeating the entire coin-flip experiment large number of times (100,000) does not have any counterpart with the (Single or Multiple) Bin-Model.

Is the purpose of repeating the coin-flip experiment only to find the estimates of P[|v-mu|>epsilon] without using the hoeffding inequality and then comparing it with the bound given by hoeffding ? Kindly clarify this point.

Thanks.

The purpose is to average out statistical fluctuations. In any given run, we may get unusual \nu's by coincidence, but by taking the average we can be confident that we captured the typical \nu values.

vsthakur 08-10-2012 06:56 PM

Problem 1.10 : Expected Off Training Error
 
Hi,
If I got it right, in a noiseless setting, for a fixed D, if all f are equally likely, the expected off-training-error of any hypothesis h is 0.5 (part d of problem 1.10, page 37) and hence any two algorithms are the same in terms of expected off training error (part e of the same problem).

My question is, does this not contradict the generalization by Hoeffding. Specifically, the following point is bothering me

By Hoeffding : Ein approaches Eout for larger number of hypothesis (i.e for small epsilon) as N grows sufficiently large. Which would imply that expected(Eout) should be approximately the same as expected(Ein) and not a constant (0.5).

Can you please provide some insight on this, perhaps my comparison is erroneous.

Thanks.

yaser 08-10-2012 08:25 PM

Re: Problem 1.10 : Expected Off Training Error
 
Quote:

Originally Posted by vsthakur (Post 3952)
Hi,
If I got it right, in a noiseless setting, for a fixed D, if all f are equally likely, the expected off-training-error of any hypothesis h is 0.5 (part d of problem 1.10, page 37) and hence any two algorithms are the same in terms of expected off training error (part e of the same problem).

My question is, does this not contradict the generalization by Hoeffding. Specifically, the following point is bothering me

By Hoeffding : Ein approaches Eout for larger number of hypothesis (i.e for small epsilon) as N grows sufficiently large. Which would imply that expected(Eout) should be approximately the same as expected(Ein) and not a constant (0.5).

Can you please provide some insight on this, perhaps my comparison is erroneous.

This is an important question, and I thank you for asking it. There is a subtle point that creates this impression of contradiction.

On face value, the statement "all f are equally likely" sounds reasonable. Mathematically, it corresponds to trying to learn a randomly generated target function, and getting the average performance of this learning process. It should not be a surprise that we would get 0.5 under these circumstances, since almost all random functions are impossible to learn.

In terms of E_{\rm in} and E_{\rm out}, Hoeffding inequality certainly holds for each of these random target functions, but E_{\rm in} itself will be close to 0.5 for almost all of these functions since they have no pattern to fit, so Hoeffding would indeed predict E_{\rm out} to be close to 0.5 on average.

This is why learning was decomposed into two separate questions in this chapter. In terms of these two questions, the one that "fails" in the random function approach is "E_{\rm in}\approx 0?"

Let me finally comment that treating "all f are equally likely" as a plausible statement is a common trap. This issue is addressed in detail in the context of Bayesian learning in the following video segment:

http://work.caltech.edu/library/182.html

vsthakur 08-10-2012 10:56 PM

Re: Problem 1.10 : Expected Off Training Error
 
My takeaway point : All f are equally likely corresponds to trying to learn a randomly generated target function

Thanks for the detailed explanation. The Bayesian Learning example highlights the ramifications of this assumption, very useful point indeed.


All times are GMT -7. The time now is 06:22 AM.

Powered by vBulletin® Version 3.8.3
Copyright ©2000 - 2019, Jelsoft Enterprises Ltd.
The contents of this forum are to be used ONLY by readers of the Learning From Data book by Yaser S. Abu-Mostafa, Malik Magdon-Ismail, and Hsuan-Tien Lin, and participants in the Learning From Data MOOC by Yaser S. Abu-Mostafa. No part of these contents is to be communicated or made accessible to ANY other person or entity.