Some questions about the lecture two
Recently, I'm learning the video tutorials of learning from data . Two questions are lingering in my mind about the lecture two, which focused on the feasibility of learning: the first one is that how to define the feasibility of learning? Is the Hoeffding's Inequality the useful tool to gauge feasibility for one hypothesis h? take the hypothesis h1 and h2 for example, if both satisfy the Hoeffding's Inequality, then what we should do next? Another confusion is that in the case of mutiple h's, the simple solution to the modification of Hoeffding's Inequality could be useless as M is close to infinity. In my opinion ,the Hoeffding's Inequality seems to hold in this situation for the following reason:
P[ Ein(g) − Eout(g) > ǫ ] ≤ P[ Ein(h1) − Eout(h1) > ǫ
or Ein(h2) − Eout(h2) > ǫ· · ·
is no more than the minimum of them, that makes the Hoeffding's Inequality satisified. Is there a logical or mathematical error ? When it comes to the exception, I think that even the best learning algorithm could meet the special situation that only few of them could be equal to the target function for randomness, or we could consider this standard : for each hypothesis hi, i=1,2,...M, toss the coin N times, then calculate the possibility of the times that coins get all heads is less than one particular value t. such as N/4 or others. Is this standard viable? All responses are appreciated. Thank you.
