Why multiply right side of Hoeffding Inequality by number of hypothesis?
I don't quite understand reasoning here.
I understand that probability of event from many attempts is increasing (the more coins we toss the higher probability to have at least one tails). And we can loosely estimate such probability as P(A or B) <P(A)+P(B), ignoring term P(A and B) for simplicity.
But why should we apply this estimation to our goal function g(x) if we chose it as the only result of our learning? We don't ask ourselves "What's the probability of bad event (exceeding tolerance to generalization) among ALL our tested hypothesis?". We interested in "What's the probability of the bad event for a certain hypothesis, we somehow chose?" I mean, yes, probability to toss at least 1 tails with 10 coins is very close to 1, but nonetheless, probability to toss tails for each single coin out of those ten is still 0.5, right?
So why to lift the threshold of probability of bad event for our final hypothesis, multiplying right part of inequality by M?
