Quote:
Originally Posted by yaser
Indeed. One way to look at this is that margins are basically regularizers. The more training points you have the less regularization that is needed, and the closer the regularized and unregularized solutions are to each other. Is this the main issue in your first post?

The issue was more that (1) I was expecting more of an improvement for SVM vs PLA, and was trying to understand why there wasn't; and (2) in a real problem, points near the true boundary would be rarer than points away  both because the space around the boundary is a small fraction of total space, and because (hopefully) the + and  examples come from realworld distributions centered somewhat away from the boundary; so, was trying to understand what happens in such a case.
If few training points fall near the true boundary this could be because (1) dataset is too small, or (2) the underlying data distribution has low density near the boundary. If (1), then SVM has an advantage because it's more likely to track the true boundary than a random linear separator like PLA.
If (2), then SVM still does better near the boundary, but the density of points there is so small that E_out is not much improved by getting them right.
I guess in practice, (1) is more common?