LFD Book Forum (http://book.caltech.edu/bookforum/index.php)
-   The Final (http://book.caltech.edu/bookforum/forumdisplay.php?f=138)

 skwong 06-13-2013 08:34 PM

I would like to take this chance to express my sincere thanks to Prof. Yaser S. Abu-Mostafa. This is an extremely good class. I had watched at least twice for each of the video. There are still a lot to learn after the course.

Still, I have some doubts. E.g., isn't it the RBF can result in an hypothesis for an infinite dimension space ? If that is the case, then with 100 points in Q14, the worse case is the SVM + SBF ends up with 100 support vector. Am I right ?

Then, the hard margin SVM actually can guarantee Ein=0 => guarantee linearly separable ?

Why Q14 can end up with Ein != 0 ? And why it said Ein=0 is not linearly separable ? I read another thread about this but still cannot solve the above questions.

Can anyone give me some hints ?

 yaser 06-13-2013 09:57 PM

Re: Q14 about linearly separable by SVM

You are right, but in the lectures we did not prove that for the RBF kernel, so it was worth exploring the question at least empirically.

In general, it is conceivable that a transformed infinite-dimensional space may still fail to separate a finite data set. For instance, take every dimension in the infinite-dimensional space to be a (possibly different) linear transformation of the original space. In this case, you would still be implementing just a linear classifier in the original space when you use a linear classifier in the transformed space, so you will fail to separate a set of points that is not linearly separable in the original space.

 skwong 06-21-2013 07:11 AM

Sorry to annoy you again. Let me summarize my understanding point by point:

(1) In one sense, hard margin SVM is no different from simpler algorithm like
PLA for linearly separable data (albeit the result may be different,
they are the same in terms of generalization, Ein = 0, ...).

(2) Point (1) still apply for non-linear transformed data.

(3) In ML, in an attempt to find a separation plane (or line) is somehow
similar to find out the coefficient of an polynomial (in case a
polynomial is used as the hypothesis set), i.e., the .

(3a) Although the coefficient of the polynomial will not be found in
explicit form, one can either view it as the data being transformed
to a different space (higher or lower dimensional (normally not necessary
to use lower dimension)) and separated linearly; alternatively, it can
be mapped back to the original space and interpret as an higher order
polynomial.

(4) This hold true for hard margin SVM, and for data explicity transformed
nonlinearly.

(5) From what I have done in Q14, with hard margin SVM + RBF kernel on 100
data points, it can always separate the data linearly (Ein = 0). And it
matches with my understanding.

Then, my question is: is RBF regular form not normally used for
supervised training ?

We learn a lot from the final exam paper about the RBF regular form.
As the performance is normally not as good as SVM, also we have no
cue about what is the best K. Does it mean, in supervised learning,
we normally will not consider to use RBF regular form ?

 yaser 06-21-2013 11:53 AM

Quote:
 Originally Posted by skwong (Post 11162) (1) In one sense, hard margin SVM is no different from simpler algorithm like PLA for linearly separable data (albeit the result may be different, they are the same in terms of generalization, Ein = 0, ...).
It is no different in having a linear hypothesis set, but it is different in the learning algorithm that chooses a particular hypothesis from that set that happens to maximize the margin.

Quote:
 (5) From what I have done in Q14, with hard margin SVM + RBF kernel on 100 data points, it can always separate the data linearly (Ein = 0). And it matches with my understanding.

Quote:
 Then, my question is: is RBF regular form not normally used for supervised training ? We learn a lot from the final exam paper about the RBF regular form. As the performance is normally not as good as SVM, also we have no cue about what is the best K.
People do use regular RBF, but not often, and not as often as they once did. The best (number of clusters) is a perpetual question in unsupervised learning, with many clever techniques but none conclusive.

 Elroch 06-21-2013 11:59 AM

My thoughts on skwong's post:
(1) There are reasons why SVM is a major workhorse of machine learning, while PLA is mainly found early in machine learning courses and books (and the RBF regular form is another method that is not popular. EDIT: thanks, Yaser for the information that it used to be used more). And it's not mere fashion! In realistic linearly separable situations, SVM gives better generalisation than PLA. It also usually gives better generalisation than RBF regular form. It's that really matters, not ! The advantage over PLA would extend to PLA with non-linear transforms. As well as that, for problems with not too many data points, SVM is computationally efficient.

Moreover, soft margin SVM is a major tool for classification where there is either not enough data or noise (it's a struggle to get useful results from PLA for these: the pocket algorithm is a bit like taking shots in the dark here, whereas SVM heads straight to the global optimum solution).

(2) see (1)

(3) The w is simply a natural way of representing a hyperplane. The relationship to polynomials is that polynomial models become linear models when viewed in the transformed space (with dimensions for each power of x). This is worth studying.

(4) Yes

(5) I think you are right. RBF regular form tends not to generalise as well as SVM in realistic scenarios, hence people use SVM (spot the recurring theme? :) )

 skwong 06-21-2013 06:46 PM