View Single Post
Old 03-07-2016, 02:28 AM
ntvy95 ntvy95 is offline
Join Date: Jan 2016
Posts: 37
Default Re: Hoeffding Inequality

I think you can take a look on MaciekLeks' post for the experiment result of the Exercise 1.10 (in the book).

In my understanding: g is the final hypothesis that is known after the data set is generated (because the choice of final hypothesis is based on the specific data set). Before the data set is generated, all the information that we know about g is that g is one of the hypotheses in H (hence the M). h is a specific hypothesis that is an element of H, and I don't think that we are selecting h, I think we are selecting g instead.

Originally Posted by pouramini View Post
I also have the same questions, and I read your replies

please consider if I have the correct conclusions:

1- we cannot plug "g" for "h" in inequality, because it depends on the sample we already selected, or in other words, we choose it deliberately (as the h with lowest error inside D) like selecting the bin which has the minimum frequency of heads.

So! what if we select "g" randomly? (in a uniform distribution of hs ?) or to select a bin randomly, then can we use Hoeffding inequality for "g"? or still we should consider M, the H size?

2- which of the following interpretation for equation 1.6 are correct:
  • The only function that has zero error inside and outside D is f, So if the number of hypothesis increases, then the chance to select "f" (the correct function, or better approximation) becomes lower. (however I feel its not what you say)
  • Or maybe, when we increase the number of hypothesis, we increase the chance that data behave differently inside and outside the D! for example if we limit the hypothesis to one! we may have high error but we lower the difference between E(in) and E(out). For example if we use one feature, we have limited the number of hypothesis! then when we evaluate h outside D, its not flexible enough to show minor errors, then it is more close to E(in)?!

Second question:

In "h is fixed before you generate the data set"
I also can't understand your emphasis on "before".

Do you want to say that h shouldn't change?
because I feel h is independent from D then "before" or "after" doesn't mean much. We don't need to have an h in mind to be able to generate D, we can select D, then decide which h to use, then evaluate h over D, but we should use the same h for the test set, right? or maybe h is used somehow in generating D?! Anyway, I think you may mean it should be selected independently from D
Reply With Quote