#1




*ANSWER* Q13 about linearly separable by SVM
I would like to take this chance to express my sincere thanks to Prof. Yaser S. AbuMostafa. This is an extremely good class. I had watched at least twice for each of the video. There are still a lot to learn after the course.
Still, I have some doubts. E.g., isn't it the RBF can result in an hypothesis for an infinite dimension space ? If that is the case, then with 100 points in Q14, the worse case is the SVM + SBF ends up with 100 support vector. Am I right ? Then, the hard margin SVM actually can guarantee Ein=0 => guarantee linearly separable ? Why Q14 can end up with Ein != 0 ? And why it said Ein=0 is not linearly separable ? I read another thread about this but still cannot solve the above questions. Can anyone give me some hints ? Thanks in advance. 
#2




Re: Q14 about linearly separable by SVM
You are right, but in the lectures we did not prove that for the RBF kernel, so it was worth exploring the question at least empirically.
In general, it is conceivable that a transformed infinitedimensional space may still fail to separate a finite data set. For instance, take every dimension in the infinitedimensional space to be a (possibly different) linear transformation of the original space. In this case, you would still be implementing just a linear classifier in the original space when you use a linear classifier in the transformed space, so you will fail to separate a set of points that is not linearly separable in the original space.
__________________
Where everyone thinks alike, no one thinks very much 
#3




Re: *ANSWER* Q14 about linearly separable by SVM
Sorry to annoy you again. Let me summarize my understanding point by point:
(1) In one sense, hard margin SVM is no different from simpler algorithm like PLA for linearly separable data (albeit the result may be different, they are the same in terms of generalization, Ein = 0, ...). (2) Point (1) still apply for nonlinear transformed data. (3) In ML, in an attempt to find a separation plane (or line) is somehow similar to find out the coefficient of an polynomial (in case a polynomial is used as the hypothesis set), i.e., the . (3a) Although the coefficient of the polynomial will not be found in explicit form, one can either view it as the data being transformed to a different space (higher or lower dimensional (normally not necessary to use lower dimension)) and separated linearly; alternatively, it can be mapped back to the original space and interpret as an higher order polynomial. (4) This hold true for hard margin SVM, and for data explicity transformed nonlinearly. (5) From what I have done in Q14, with hard margin SVM + RBF kernel on 100 data points, it can always separate the data linearly (Ein = 0). And it matches with my understanding. Then, my question is: is RBF regular form not normally used for supervised training ? We learn a lot from the final exam paper about the RBF regular form. As the performance is normally not as good as SVM, also we have no cue about what is the best K. Does it mean, in supervised learning, we normally will not consider to use RBF regular form ? 
#4




Re: *ANSWER* Q14 about linearly separable by SVM
Quote:
Quote:
Quote:
__________________
Where everyone thinks alike, no one thinks very much 
#5




Re: *ANSWER* Q14 about linearly separable by SVM
My thoughts on skwong's post:
(1) There are reasons why SVM is a major workhorse of machine learning, while PLA is mainly found early in machine learning courses and books (and the RBF regular form is another method that is not popular. EDIT: thanks, Yaser for the information that it used to be used more). And it's not mere fashion! In realistic linearly separable situations, SVM gives better generalisation than PLA. It also usually gives better generalisation than RBF regular form. It's that really matters, not ! The advantage over PLA would extend to PLA with nonlinear transforms. As well as that, for problems with not too many data points, SVM is computationally efficient. Moreover, soft margin SVM is a major tool for classification where there is either not enough data or noise (it's a struggle to get useful results from PLA for these: the pocket algorithm is a bit like taking shots in the dark here, whereas SVM heads straight to the global optimum solution). (2) see (1) (3) The w is simply a natural way of representing a hyperplane. The relationship to polynomials is that polynomial models become linear models when viewed in the transformed space (with dimensions for each power of x). This is worth studying. (4) Yes (5) I think you are right. RBF regular form tends not to generalise as well as SVM in realistic scenarios, hence people use SVM (spot the recurring theme? ) 
#6




Re: *ANSWER* Q14 about linearly separable by SVM
Many many thanks to Yaser and Elroch.

Thread Tools  
Display Modes  

