 LFD Book Forum Question 2
 Register FAQ Calendar Mark Forums Read

#1
 Ziad Hatahet Member Join Date: Apr 2013 Location: San Francisco, CA Posts: 23 Question 2

Hi all,

Is it correct to assume that each one of the thousand coins is its own "bin" in the multiple bin scenario of the Hoeffding inequality as explained in lecture 2, and therefore, is a separate hypothesis from the other bins?

Prof. Yaser mentioned in another post (see here) that:

Quote:
 If you fix any hypothesis then run the experiment, the probability will always be bounded by that term.
I presume the dual does not apply (i.e. not fixing the hypothesis will result in it not being bound by the Hoeffding inequality), regardless of whether or not it happens to be bound by chance. Is that a valid assumption?
#2
 jlaurentum Member Join Date: Apr 2013 Location: Venezuela Posts: 41 Re: Question 2

I am hung up on this question too.

As I interpret it, a "bin" in this question corresponds to a particular way for defining a random experiment. In any case, we flip 1000 coins 10 times each and:

1) the random experiment for coin 1 consists of taking the 10 flips for that coin and averaging the number of heads, reporting that average as the sample frequency.
2) For the random coin, the random experiment consists in taking a random coin (1-1000) and averaging out the number of heads in the 10 flips for that coin.
3) for the "minimum number of heads coin", we flip 1000 coins 10 times each as before, but take the minimum frequency of heads obtained among the 1000 coins as the reported sample frequency.

Each of these 3 situations constitutes a random experiment because it can be replicated any number of times. Each of the three scenarios has an expected frequency. The first two have an expected frequency (mu) of 0.5, the third has a different expected frequency but it can be calculated analytically (the probability distribution for the possible sample frequencies 0, 0.1, ..., 0.9, 1 can be determined for the "minimum heads" coin). The sample frequencies converge to the respective expected frequencies for the three scenarios, according to the law of large numbers. (No surprise)...

So back to this:

Quote:
 If you fix any hypothesis then run the experiment, the probability will always be bounded by that term.
Does fixing the hypothesis mean fixing the coins, or does it mean being able to fix a random experiment (procedure) by which we obtain the sample frequencies?
#3 yaser Caltech Join Date: Aug 2009 Location: Pasadena, California, USA Posts: 1,478 Re: Question 2

Quote:
 Originally Posted by jlaurentum Does fixing the hypothesis mean fixing the coins, or does it mean being able to fix a random experiment (procedure) by which we obtain the sample frequencies?
Let me focus on the correspondence between the bin world and the hypothesis world since this seems to be a source of confusion.

The bin corresponds to the input space , with each marble corresponding to an input point .

The colors in a bin correspond to how a hypothesis agrees with the target; the color of a marble being red if for that 'marble' .

Fixed hypothesis means the colors inside the bin are fixed prior to drawing the sample. What makes a hypothesis 'not fixed' is that we have a bunch of bins and we select a bin according to the sample it produced (e.g., an all green sample). The reason we call the hypothesis not fixed in this case is because which bin we pick (hence which colors are inside, hence which hypothesis we are talking about) depends on the samples that have already been produced.
__________________
Where everyone thinks alike, no one thinks very much
#4
 Ziad Hatahet Member Join Date: Apr 2013 Location: San Francisco, CA Posts: 23 Re: Question 2

Quote:
 Originally Posted by yaser What makes a hypothesis 'not fixed' is that we have a bunch of bins and we select a bin according to the sample it produced (e.g., an all green sample). The reason we call the hypothesis not fixed in this case is because which bin we pick (hence which colors are inside, hence which hypothesis we are talking about) depends on the samples that have already been produced.
That is what we do when we sift through the bins (hypotheses) looking for an whose sample we got minimizes . But given the quote in my first post, doesn't it mean that the bin chosen by this procedure is not bound by ?
#5 yaser Caltech Join Date: Aug 2009 Location: Pasadena, California, USA Posts: 1,478 Re: Question 2

Quote:
 Originally Posted by Ziad Hatahet But given the quote in my first post, doesn't it mean that the bin chosen by this procedure is not bound by ?
This is discussed here:

http://book.caltech.edu/bookforum/sh...0360#post10360

The main point is that once you consider the sample, the probability becomes conditional on how this sample came out, and that could violate the bound, whereas the probability before a sample was drawn always obeys the bound.
__________________
Where everyone thinks alike, no one thinks very much
#6
 sptripathi Junior Member Join Date: Apr 2013 Posts: 8 Re: Question 2

Quote:
 Originally Posted by yaser This is discussed here: http://book.caltech.edu/bookforum/sh...0360#post10360 The main point is that once you consider the sample, the probability becomes conditional on how this sample came out, and that could violate the bound, whereas the probability before a sample was drawn always obeys the bound.
Is the Cmin coin of Q2 (HW2) not analogous to pocket PLA's final-hypothesis g ? In pocket PLA, we always choose the hypothesis which gave min-disagreement across experiments(~ bins). The probability of bad-event in pocket case is still bound by hoeffdings inequality, right ?
The possible difference I can see is that in Cmin case, sample kept changing, while in pocket case sample does not change, only color of marbles inside-bin and outside bin changes, as the hypothesis changes. Is that the key point?
#7 yaser Caltech Join Date: Aug 2009 Location: Pasadena, California, USA Posts: 1,478 Re: Question 2

Quote:
 Originally Posted by sptripathi In pocket PLA, we always choose the hypothesis which gave min-disagreement across experiments(~ bins). The probability of bad-event in pocket case is still bound by hoeffdings inequality, right ?
It is bounded by the multiple-bin version of Hoeffding inequality. If we apply that version of the inequality directly to this case, the result will be meaningless since there is an infinite number of bins (hypotheses) in this case. This is the reason we will develop the theory further in the coming two weeks; to deal with this dilemma.

Quote:
 The possible difference I can see is that in Cmin case, sample kept changing, while in pocket case sample does not change, only color of marbles inside-bin and outside bin changes, as the hypothesis changes.
Your observation is correct, and this is why we used the union bound which is valid for all cases regardless of the dependence or independence of the samples.
__________________
Where everyone thinks alike, no one thinks very much

 Thread Tools Show Printable Version Email this Page Display Modes Linear Mode Switch to Hybrid Mode Switch to Threaded Mode Posting Rules You may not post new threads You may not post replies You may not post attachments You may not edit your posts BB code is On Smilies are On [IMG] code is On HTML code is Off Forum Rules
 Forum Jump User Control Panel Private Messages Subscriptions Who's Online Search Forums Forums Home General     General Discussion of Machine Learning     Free Additional Material         Dynamic e-Chapters         Dynamic e-Appendices Course Discussions     Online LFD course         General comments on the course         Homework 1         Homework 2         Homework 3         Homework 4         Homework 5         Homework 6         Homework 7         Homework 8         The Final         Create New Homework Problems Book Feedback - Learning From Data     General comments on the book     Chapter 1 - The Learning Problem     Chapter 2 - Training versus Testing     Chapter 3 - The Linear Model     Chapter 4 - Overfitting     Chapter 5 - Three Learning Principles     e-Chapter 6 - Similarity Based Methods     e-Chapter 7 - Neural Networks     e-Chapter 8 - Support Vector Machines     e-Chapter 9 - Learning Aides     Appendix and Notation     e-Appendices

All times are GMT -7. The time now is 06:56 PM. The contents of this forum are to be used ONLY by readers of the Learning From Data book by Yaser S. Abu-Mostafa, Malik Magdon-Ismail, and Hsuan-Tien Lin, and participants in the Learning From Data MOOC by Yaser S. Abu-Mostafa. No part of these contents is to be communicated or made accessible to ANY other person or entity.