I have the same concern as scottedwards2000 and I still don't understand how it is resolved.

As I understand the bin symbolizes the probability space of all possible inputs

. Sample of balls drawn randomly from the bin symbolizes our training set

.

Now we pick a hypothesis

(suppose we are running PLA). We look at our sample

, compute

and use Hoeffding's Inequality. We do one step of PLA and come up with new hypothesis

which automatically gives us

and professor is saying that we can write down Hoeffding inequality for

and

?

I guess, we can. But that inequality tells us something about random variable

, i.e. about

:

where

is a random sample. But it seems like we are using

where

is hardly random with regard to

since we built

using that sample.

Here is an example that illustrates my point: say we tried some random

, compared it with target function

on our training sample

, wrote down Hoeffding's inequality. Now let's construct

as follows:

and

. Let's write down Hoeffding's ineqaulity for this hypothesis. If we are indeed using

then here it would be equal to 1 since

on

and we would have:

is small. So somehow we are saying with high probability that

does an excellent job out of sample though we didn't change it much from

. This example shouldn't be correct, right? If it isn't how is the one with PLA correct?