SVMs and the input distribution
If the input distribution has high density near the target boundary, the sample will likely contain points near the boundary, so that large-margin or small-margin classifiers will be similar. If the input distribution has low density near the boundary, then the sample will have few near-boundary points, giving advantage to a large-margin classifier -- but then also, the probability of drawing a near-margin point during out-of-sample use is low, so E_out for low-margin classifiers is not much affected.
Why does this not limit the advantage of large-margin classifiers in practice?
|