LFD Book Forum  

Go Back   LFD Book Forum > Book Feedback - Learning From Data > Chapter 1 - The Learning Problem

Reply
 
Thread Tools Display Modes
  #1  
Old 03-25-2012, 12:24 AM
yaser's Avatar
yaser yaser is offline
Caltech
 
Join Date: Aug 2009
Location: Pasadena, California, USA
Posts: 1,474
Post Exercises and Problems

Please comment on the chapter problems in terms of difficulty, clarity, and time demands. This information will help us and other instructors in choosing problems to assign in our classes.

Also, please comment on the exercises in terms of how useful they are in understanding the material.
__________________
Where everyone thinks alike, no one thinks very much
Reply With Quote
  #2  
Old 04-21-2012, 10:20 PM
pablo pablo is offline
Invited Guest
 
Join Date: Apr 2012
Posts: 2
Default Exercise 1.10

Since this is a theory week, thought this might be a good time to explore Hoeffding a bit.

I understand c_1 and c_rand satisfy Hoeffding experimentally as described, but conceptually does c_rand satisfy Hoeffding? For example, suppose it is unknown whether each coin is fair (or that they are known to have varying fairness - e.g. c_1 is 50/50, c_2 is 40/60, etc.). Would each coin represent a separate 'bin' or would the random selection of a coin plus the ten flips represent the randomized selection condition for c_rand?

Trying to understand if it's necessary for the coins to be identical.
Reply With Quote
  #3  
Old 04-21-2012, 11:32 PM
yaser's Avatar
yaser yaser is offline
Caltech
 
Join Date: Aug 2009
Location: Pasadena, California, USA
Posts: 1,474
Default Re: Exercise 1.10

Quote:
Originally Posted by pablo View Post
Since this is a theory week, thought this might be a good time to explore Hoeffding a bit.

I understand c_1 and c_rand satisfy Hoeffding experimentally as described, but conceptually does c_rand satisfy Hoeffding? For example, suppose it is unknown whether each coin is fair (or that they are known to have varying fairness - e.g. c_1 is 50/50, c_2 is 40/60, etc.). Would each coin represent a separate 'bin' or would the random selection of a coin plus the ten flips represent the randomized selection condition for c_rand?

Trying to understand if it's necessary for the coins to be identical.
Interesting question indeed. Hoeffding does apply to each randomly selected coin individually as you point out. If the coins have different values of E_{\rm out}, then the added randomization due to the selection of a coin affects the relationship between E_{\rm in} and E_{\rm out}. This is exactly the premise of the complete bin model analysis.
__________________
Where everyone thinks alike, no one thinks very much
Reply With Quote
  #4  
Old 07-02-2012, 03:13 PM
tadworthington tadworthington is offline
Member
 
Join Date: Jun 2012
Location: Chicago, IL
Posts: 32
Default Re: Exercises and Problems

Problem 1.4

WHAT I LIKED:
This is an excellent problem, and a model one in my opinion for helping a student to understand material. It starts out easy, with a small data set - that, importantly, is user-generated. Having to generate a data set - even one as simple as a linearly separable data set in 2 dimensions - goes a long way to helping understand how the Perceptron works and why it wouldn't work if the data were not linearly separable. We gradually progress to bigger data sets, all along the way plotting the data so that we can see what is happening. It is not only instructive but also quite exhilarating to see the results of the PLA (plotting both the target function and the hypothesis generated by the PLA) actually graphed out in front of you on a computer screen!

I also thought that the progression to 10 dimensions only after working through the more easily visualized 2 dimensional examples is perfect. Finally, the histogram approach to understanding the computational complexity of the PLA was, I thought, genius.

TIME SPENT:
Overall I spent about 90 minutes on the problem, although a lot of that was spent documenting my code for future reference, and just general "prettying" of my plots and results; so I would guess that this problem can be completed comfortably in an hour, assuming knowledge of programming in a language where plotting is easy (I used Python, but Matlab/Octave would also be excellent examples for quick-and-dirty graphics programming of this type.) For someone with no programming experience, the problem would probably take much more time.

CLARITY:
I think the problem is exceptionally clear.

DIFFICULTY:
I thought the problem was relatively easy, but the Perceptron is a relatively easy concept so I think making it harder is neither necessary nor appropriate. For metrics purposes, on a 1-10 scale of difficulty I would give this a 3.
Reply With Quote
  #5  
Old 07-04-2012, 03:16 PM
htlin's Avatar
htlin htlin is offline
NTU
 
Join Date: Aug 2009
Location: Taipei, Taiwan
Posts: 562
Default Re: Exercises and Problems

Quote:
Originally Posted by tadworthington View Post
Problem 1.4
Thank you so much for the valuable feedback!
__________________
When one teaches, two learn.
Reply With Quote
  #6  
Old 07-27-2012, 08:38 PM
vsthakur vsthakur is offline
Member
 
Join Date: Jun 2012
Posts: 14
Default Re: Exercise 1.10

While comparing the Coin-Flip experiment with the Bin-Model
a) Sample size of the individual Bin is equal to the number of coin flips (N=10)
b) Number of hypothesis is equal to number of coins (M=1000)

but, the aspect of repeating the entire coin-flip experiment large number of times (100,000) does not have any counterpart with the (Single or Multiple) Bin-Model.

Is the purpose of repeating the coin-flip experiment only to find the estimates of P[|v-mu|>epsilon] without using the hoeffding inequality and then comparing it with the bound given by hoeffding ? Kindly clarify this point.

Thanks.
Reply With Quote
  #7  
Old 07-27-2012, 09:30 PM
yaser's Avatar
yaser yaser is offline
Caltech
 
Join Date: Aug 2009
Location: Pasadena, California, USA
Posts: 1,474
Default Re: Exercise 1.10

Quote:
Originally Posted by vsthakur View Post
While comparing the Coin-Flip experiment with the Bin-Model
a) Sample size of the individual Bin is equal to the number of coin flips (N=10)
b) Number of hypothesis is equal to number of coins (M=1000)

but, the aspect of repeating the entire coin-flip experiment large number of times (100,000) does not have any counterpart with the (Single or Multiple) Bin-Model.

Is the purpose of repeating the coin-flip experiment only to find the estimates of P[|v-mu|>epsilon] without using the hoeffding inequality and then comparing it with the bound given by hoeffding ? Kindly clarify this point.

Thanks.
The purpose is to average out statistical fluctuations. In any given run, we may get unusual \nu's by coincidence, but by taking the average we can be confident that we captured the typical \nu values.
__________________
Where everyone thinks alike, no one thinks very much
Reply With Quote
  #8  
Old 08-10-2012, 07:56 PM
vsthakur vsthakur is offline
Member
 
Join Date: Jun 2012
Posts: 14
Default Problem 1.10 : Expected Off Training Error

Hi,
If I got it right, in a noiseless setting, for a fixed D, if all f are equally likely, the expected off-training-error of any hypothesis h is 0.5 (part d of problem 1.10, page 37) and hence any two algorithms are the same in terms of expected off training error (part e of the same problem).

My question is, does this not contradict the generalization by Hoeffding. Specifically, the following point is bothering me

By Hoeffding : Ein approaches Eout for larger number of hypothesis (i.e for small epsilon) as N grows sufficiently large. Which would imply that expected(Eout) should be approximately the same as expected(Ein) and not a constant (0.5).

Can you please provide some insight on this, perhaps my comparison is erroneous.

Thanks.
Reply With Quote
  #9  
Old 08-10-2012, 09:25 PM
yaser's Avatar
yaser yaser is offline
Caltech
 
Join Date: Aug 2009
Location: Pasadena, California, USA
Posts: 1,474
Default Re: Problem 1.10 : Expected Off Training Error

Quote:
Originally Posted by vsthakur View Post
Hi,
If I got it right, in a noiseless setting, for a fixed D, if all f are equally likely, the expected off-training-error of any hypothesis h is 0.5 (part d of problem 1.10, page 37) and hence any two algorithms are the same in terms of expected off training error (part e of the same problem).

My question is, does this not contradict the generalization by Hoeffding. Specifically, the following point is bothering me

By Hoeffding : Ein approaches Eout for larger number of hypothesis (i.e for small epsilon) as N grows sufficiently large. Which would imply that expected(Eout) should be approximately the same as expected(Ein) and not a constant (0.5).

Can you please provide some insight on this, perhaps my comparison is erroneous.
This is an important question, and I thank you for asking it. There is a subtle point that creates this impression of contradiction.

On face value, the statement "all f are equally likely" sounds reasonable. Mathematically, it corresponds to trying to learn a randomly generated target function, and getting the average performance of this learning process. It should not be a surprise that we would get 0.5 under these circumstances, since almost all random functions are impossible to learn.

In terms of E_{\rm in} and E_{\rm out}, Hoeffding inequality certainly holds for each of these random target functions, but E_{\rm in} itself will be close to 0.5 for almost all of these functions since they have no pattern to fit, so Hoeffding would indeed predict E_{\rm out} to be close to 0.5 on average.

This is why learning was decomposed into two separate questions in this chapter. In terms of these two questions, the one that "fails" in the random function approach is "E_{\rm in}\approx 0?"

Let me finally comment that treating "all f are equally likely" as a plausible statement is a common trap. This issue is addressed in detail in the context of Bayesian learning in the following video segment:

http://work.caltech.edu/library/182.html
__________________
Where everyone thinks alike, no one thinks very much
Reply With Quote
  #10  
Old 08-10-2012, 11:56 PM
vsthakur vsthakur is offline
Member
 
Join Date: Jun 2012
Posts: 14
Default Re: Problem 1.10 : Expected Off Training Error

My takeaway point : All f are equally likely corresponds to trying to learn a randomly generated target function

Thanks for the detailed explanation. The Bayesian Learning example highlights the ramifications of this assumption, very useful point indeed.
Reply With Quote
Reply

Tags
hoeffding's inequality, hoeffding-inequality

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT -7. The time now is 08:46 PM.


Powered by vBulletin® Version 3.8.3
Copyright ©2000 - 2017, Jelsoft Enterprises Ltd.
The contents of this forum are to be used ONLY by readers of the Learning From Data book by Yaser S. Abu-Mostafa, Malik Magdon-Ismail, and Hsuan-Tien Lin, and participants in the Learning From Data MOOC by Yaser S. Abu-Mostafa. No part of these contents is to be communicated or made accessible to ANY other person or entity.