![]() |
SVMs and the input distribution
If the input distribution has high density near the target boundary, the sample will likely contain points near the boundary, so that large-margin or small-margin classifiers will be similar. If the input distribution has low density near the boundary, then the sample will have few near-boundary points, giving advantage to a large-margin classifier -- but then also, the probability of drawing a near-margin point during out-of-sample use is low, so E_out for low-margin classifiers is not much affected.
Why does this not limit the advantage of large-margin classifiers in practice? |
Re: SVMs and the input distribution
Quote:
|
Re: SVMs and the input distribution
Quote:
|
Re: SVMs and the input distribution
Quote:
This observation does not affect the answers to Problems 8,9 one way or the other, since these problems only address which of the two methods is better, whether it is slightly better or significantly better. |
Re: SVMs and the input distribution
Quote:
If few training points fall near the true boundary this could be because (1) dataset is too small, or (2) the underlying data distribution has low density near the boundary. If (1), then SVM has an advantage because it's more likely to track the true boundary than a random linear separator like PLA. If (2), then SVM still does better near the boundary, but the density of points there is so small that E_out is not much improved by getting them right. I guess in practice, (1) is more common? |
Re: SVMs and the input distribution
In the problem, the points are uniform randomly distributed. With a smaller number of points, the gap is, statistically, larger. Given N points, the line that created the classification could be anywhere in the gap. The SVM solution should be close to the center of the gap. My guess is that PLA can also be anywhere in the gap.
Given that, you can see that the SVM solution should be closer more often, though not so easily to guess how often. |
All times are GMT -7. The time now is 06:18 PM. |
Powered by vBulletin® Version 3.8.3
Copyright ©2000 - 2019, Jelsoft Enterprises Ltd.
The contents of this forum are to be used ONLY by readers of the Learning From Data book by Yaser S. Abu-Mostafa, Malik Magdon-Ismail, and Hsuan-Tien Lin, and participants in the Learning From Data MOOC by Yaser S. Abu-Mostafa. No part of these contents is to be communicated or made accessible to ANY other person or entity.