Register FAQ Calendar Mark Forums Read

#1
06-13-2013, 08:34 PM
 skwong Member Join Date: Apr 2013 Location: Hong Kong Posts: 13

I would like to take this chance to express my sincere thanks to Prof. Yaser S. Abu-Mostafa. This is an extremely good class. I had watched at least twice for each of the video. There are still a lot to learn after the course.

Still, I have some doubts. E.g., isn't it the RBF can result in an hypothesis for an infinite dimension space ? If that is the case, then with 100 points in Q14, the worse case is the SVM + SBF ends up with 100 support vector. Am I right ?

Then, the hard margin SVM actually can guarantee Ein=0 => guarantee linearly separable ?

Why Q14 can end up with Ein != 0 ? And why it said Ein=0 is not linearly separable ? I read another thread about this but still cannot solve the above questions.

Can anyone give me some hints ?

#2
06-13-2013, 09:57 PM
 yaser Caltech Join Date: Aug 2009 Location: Pasadena, California, USA Posts: 1,478
Re: Q14 about linearly separable by SVM

You are right, but in the lectures we did not prove that for the RBF kernel, so it was worth exploring the question at least empirically.

In general, it is conceivable that a transformed infinite-dimensional space may still fail to separate a finite data set. For instance, take every dimension in the infinite-dimensional space to be a (possibly different) linear transformation of the original space. In this case, you would still be implementing just a linear classifier in the original space when you use a linear classifier in the transformed space, so you will fail to separate a set of points that is not linearly separable in the original space.
__________________
Where everyone thinks alike, no one thinks very much
#3
06-21-2013, 07:11 AM
 skwong Member Join Date: Apr 2013 Location: Hong Kong Posts: 13

Sorry to annoy you again. Let me summarize my understanding point by point:

(1) In one sense, hard margin SVM is no different from simpler algorithm like
PLA for linearly separable data (albeit the result may be different,
they are the same in terms of generalization, Ein = 0, ...).

(2) Point (1) still apply for non-linear transformed data.

(3) In ML, in an attempt to find a separation plane (or line) is somehow
similar to find out the coefficient of an polynomial (in case a
polynomial is used as the hypothesis set), i.e., the .

(3a) Although the coefficient of the polynomial will not be found in
explicit form, one can either view it as the data being transformed
to a different space (higher or lower dimensional (normally not necessary
to use lower dimension)) and separated linearly; alternatively, it can
be mapped back to the original space and interpret as an higher order
polynomial.

(4) This hold true for hard margin SVM, and for data explicity transformed
nonlinearly.

(5) From what I have done in Q14, with hard margin SVM + RBF kernel on 100
data points, it can always separate the data linearly (Ein = 0). And it
matches with my understanding.

Then, my question is: is RBF regular form not normally used for
supervised training ?

We learn a lot from the final exam paper about the RBF regular form.
As the performance is normally not as good as SVM, also we have no
cue about what is the best K. Does it mean, in supervised learning,
we normally will not consider to use RBF regular form ?
#4
06-21-2013, 11:53 AM
 yaser Caltech Join Date: Aug 2009 Location: Pasadena, California, USA Posts: 1,478

Quote:
 Originally Posted by skwong (1) In one sense, hard margin SVM is no different from simpler algorithm like PLA for linearly separable data (albeit the result may be different, they are the same in terms of generalization, Ein = 0, ...).
It is no different in having a linear hypothesis set, but it is different in the learning algorithm that chooses a particular hypothesis from that set that happens to maximize the margin.

Quote:
 (5) From what I have done in Q14, with hard margin SVM + RBF kernel on 100 data points, it can always separate the data linearly (Ein = 0). And it matches with my understanding.

Quote:
 Then, my question is: is RBF regular form not normally used for supervised training ? We learn a lot from the final exam paper about the RBF regular form. As the performance is normally not as good as SVM, also we have no cue about what is the best K.
People do use regular RBF, but not often, and not as often as they once did. The best (number of clusters) is a perpetual question in unsupervised learning, with many clever techniques but none conclusive.
__________________
Where everyone thinks alike, no one thinks very much
#5
06-21-2013, 11:59 AM
 Elroch Invited Guest Join Date: Mar 2013 Posts: 143

My thoughts on skwong's post:
(1) There are reasons why SVM is a major workhorse of machine learning, while PLA is mainly found early in machine learning courses and books (and the RBF regular form is another method that is not popular. EDIT: thanks, Yaser for the information that it used to be used more). And it's not mere fashion! In realistic linearly separable situations, SVM gives better generalisation than PLA. It also usually gives better generalisation than RBF regular form. It's that really matters, not ! The advantage over PLA would extend to PLA with non-linear transforms. As well as that, for problems with not too many data points, SVM is computationally efficient.

Moreover, soft margin SVM is a major tool for classification where there is either not enough data or noise (it's a struggle to get useful results from PLA for these: the pocket algorithm is a bit like taking shots in the dark here, whereas SVM heads straight to the global optimum solution).

(2) see (1)

(3) The w is simply a natural way of representing a hyperplane. The relationship to polynomials is that polynomial models become linear models when viewed in the transformed space (with dimensions for each power of x). This is worth studying.

(4) Yes

(5) I think you are right. RBF regular form tends not to generalise as well as SVM in realistic scenarios, hence people use SVM (spot the recurring theme? )
#6
06-21-2013, 06:46 PM
 skwong Member Join Date: Apr 2013 Location: Hong Kong Posts: 13

Many many thanks to Yaser and Elroch.

 Thread Tools Display Modes Hybrid Mode

 Posting Rules You may not post new threads You may not post replies You may not post attachments You may not edit your posts BB code is On Smilies are On [IMG] code is On HTML code is Off Forum Rules
 Forum Jump User Control Panel Private Messages Subscriptions Who's Online Search Forums Forums Home General     General Discussion of Machine Learning     Free Additional Material         Dynamic e-Chapters         Dynamic e-Appendices Course Discussions     Online LFD course         General comments on the course         Homework 1         Homework 2         Homework 3         Homework 4         Homework 5         Homework 6         Homework 7         Homework 8         The Final         Create New Homework Problems Book Feedback - Learning From Data     General comments on the book     Chapter 1 - The Learning Problem     Chapter 2 - Training versus Testing     Chapter 3 - The Linear Model     Chapter 4 - Overfitting     Chapter 5 - Three Learning Principles     e-Chapter 6 - Similarity Based Methods     e-Chapter 7 - Neural Networks     e-Chapter 8 - Support Vector Machines     e-Chapter 9 - Learning Aides     Appendix and Notation     e-Appendices

All times are GMT -7. The time now is 12:41 AM.