![]() |
#1
|
|||
|
|||
![]()
I don't quite understand reasoning here.
I understand that probability of event from many attempts is increasing (the more coins we toss the higher probability to have at least one tails). And we can loosely estimate such probability as P(A or B) <P(A)+P(B), ignoring term -P(A and B) for simplicity. But why should we apply this estimation to our goal function g(x) if we chose it as the only result of our learning? We don't ask ourselves "What's the probability of bad event (exceeding tolerance to generalization) among ALL our tested hypothesis?". We interested in "What's the probability of the bad event for a certain hypothesis, we somehow chose?" I mean, yes, probability to toss at least 1 tails with 10 coins is very close to 1, but nonetheless, probability to toss tails for each single coin out of those ten is still 0.5, right? So why to lift the threshold of probability of bad event for our final hypothesis, multiplying right part of inequality by M? |
#2
|
|||
|
|||
![]()
Oh, I think I got it.
The link in reasoning that I missed is that probability of bad event in each hypothesis MAY AFFECT our final choice. But we don't know how, so we consider the range between the best and the worst possible cases. The best case - bad events happening in tested hypothesis don't affect our choice of the final one at all, so for probability of bad event in g(x) we can still use Hoeffding without factor M. The worst case - when we have such sample, and such hypothesis set, and such learning algorithm that bad event in some hypothesis makes us to select the very same hypothesis as a final one. In this unlucky situation probability of bad event in final hypothesis will be probability of "at least one hypothesis in a set has bad event". Am I right? |
#3
|
||||
|
||||
![]()
You are right!
__________________
Where everyone thinks alike, no one thinks very much |
![]() |
Thread Tools | |
Display Modes | |
|
|