Quote:
Originally Posted by Andrew87
Hi,
according to the first post, I can't understand why the answer to the question (d) is p < 0.5.
Intuitively my answer is that there are no values of p that make probabilistically C better than S. That's why S try to minimize the error on the training data which should reflect the true distribution. In this case, C do better than S only if
(the majority of the examples are +1 GIVEN p < 0.5) OR (the majority of the examples are 1 GIVEN p > 0.5). However both the cases are less probable than the ones for which S works better. As a results, there are no value for p to reverse the situation.
Am I right ?

Referring to point (d): The crucial part is the assumption that y_n=+1 (see point (b)), C always chooses h_2, S always chooses h_1.