LFD Book Forum  

Go Back   LFD Book Forum > Course Discussions > Online LFD course > The Final

Reply
 
Thread Tools Display Modes
  #1  
Old 06-13-2013, 08:34 PM
skwong skwong is offline
Member
 
Join Date: Apr 2013
Location: Hong Kong
Posts: 13
Default *ANSWER* Q13 about linearly separable by SVM

I would like to take this chance to express my sincere thanks to Prof. Yaser S. Abu-Mostafa. This is an extremely good class. I had watched at least twice for each of the video. There are still a lot to learn after the course.

Still, I have some doubts. E.g., isn't it the RBF can result in an hypothesis for an infinite dimension space ? If that is the case, then with 100 points in Q14, the worse case is the SVM + SBF ends up with 100 support vector. Am I right ?

Then, the hard margin SVM actually can guarantee Ein=0 => guarantee linearly separable ?

Why Q14 can end up with Ein != 0 ? And why it said Ein=0 is not linearly separable ? I read another thread about this but still cannot solve the above questions.

Can anyone give me some hints ?

Thanks in advance.
Reply With Quote
  #2  
Old 06-13-2013, 09:57 PM
yaser's Avatar
yaser yaser is offline
Caltech
 
Join Date: Aug 2009
Location: Pasadena, California, USA
Posts: 1,475
Default Re: Q14 about linearly separable by SVM

You are right, but in the lectures we did not prove that for the RBF kernel, so it was worth exploring the question at least empirically.

In general, it is conceivable that a transformed infinite-dimensional space may still fail to separate a finite data set. For instance, take every dimension in the infinite-dimensional space to be a (possibly different) linear transformation of the original space. In this case, you would still be implementing just a linear classifier in the original space when you use a linear classifier in the transformed space, so you will fail to separate a set of points that is not linearly separable in the original space.
__________________
Where everyone thinks alike, no one thinks very much
Reply With Quote
  #3  
Old 06-21-2013, 07:11 AM
skwong skwong is offline
Member
 
Join Date: Apr 2013
Location: Hong Kong
Posts: 13
Default Re: *ANSWER* Q14 about linearly separable by SVM

Sorry to annoy you again. Let me summarize my understanding point by point:

(1) In one sense, hard margin SVM is no different from simpler algorithm like
PLA for linearly separable data (albeit the result may be different,
they are the same in terms of generalization, Ein = 0, ...).

(2) Point (1) still apply for non-linear transformed data.

(3) In ML, in an attempt to find a separation plane (or line) is somehow
similar to find out the coefficient of an polynomial (in case a
polynomial is used as the hypothesis set), i.e., the w.

(3a) Although the coefficient of the polynomial will not be found in
explicit form, one can either view it as the data being transformed
to a different space (higher or lower dimensional (normally not necessary
to use lower dimension)) and separated linearly; alternatively, it can
be mapped back to the original space and interpret as an higher order
polynomial.

(4) This hold true for hard margin SVM, and for data explicity transformed
nonlinearly.

(5) From what I have done in Q14, with hard margin SVM + RBF kernel on 100
data points, it can always separate the data linearly (Ein = 0). And it
matches with my understanding.

Then, my question is: is RBF regular form not normally used for
supervised training ?

We learn a lot from the final exam paper about the RBF regular form.
As the performance is normally not as good as SVM, also we have no
cue about what is the best K. Does it mean, in supervised learning,
we normally will not consider to use RBF regular form ?
Reply With Quote
  #4  
Old 06-21-2013, 11:53 AM
yaser's Avatar
yaser yaser is offline
Caltech
 
Join Date: Aug 2009
Location: Pasadena, California, USA
Posts: 1,475
Default Re: *ANSWER* Q14 about linearly separable by SVM

Quote:
Originally Posted by skwong View Post
(1) In one sense, hard margin SVM is no different from simpler algorithm like PLA for linearly separable data (albeit the result may be different, they are the same in terms of generalization, Ein = 0, ...).
It is no different in having a linear hypothesis set, but it is different in the learning algorithm that chooses a particular hypothesis from that set that happens to maximize the margin.

Quote:
(5) From what I have done in Q14, with hard margin SVM + RBF kernel on 100 data points, it can always separate the data linearly (Ein = 0). And it matches with my understanding.
Your observation is correct.

Quote:
Then, my question is: is RBF regular form not normally used for
supervised training ?

We learn a lot from the final exam paper about the RBF regular form.
As the performance is normally not as good as SVM, also we have no
cue about what is the best K.
People do use regular RBF, but not often, and not as often as they once did. The best K (number of clusters) is a perpetual question in unsupervised learning, with many clever techniques but none conclusive.
__________________
Where everyone thinks alike, no one thinks very much
Reply With Quote
  #5  
Old 06-21-2013, 11:59 AM
Elroch Elroch is offline
Invited Guest
 
Join Date: Mar 2013
Posts: 143
Default Re: *ANSWER* Q14 about linearly separable by SVM

My thoughts on skwong's post:
(1) There are reasons why SVM is a major workhorse of machine learning, while PLA is mainly found early in machine learning courses and books (and the RBF regular form is another method that is not popular. EDIT: thanks, Yaser for the information that it used to be used more). And it's not mere fashion! In realistic linearly separable situations, SVM gives better generalisation than PLA. It also usually gives better generalisation than RBF regular form. It's E_{out} that really matters, not E_{in}! The advantage over PLA would extend to PLA with non-linear transforms. As well as that, for problems with not too many data points, SVM is computationally efficient.

Moreover, soft margin SVM is a major tool for classification where there is either not enough data or noise (it's a struggle to get useful results from PLA for these: the pocket algorithm is a bit like taking shots in the dark here, whereas SVM heads straight to the global optimum solution).

(2) see (1)

(3) The w is simply a natural way of representing a hyperplane. The relationship to polynomials is that polynomial models become linear models when viewed in the transformed space (with dimensions for each power of x). This is worth studying.

(4) Yes

(5) I think you are right. RBF regular form tends not to generalise as well as SVM in realistic scenarios, hence people use SVM (spot the recurring theme? )
Reply With Quote
  #6  
Old 06-21-2013, 06:46 PM
skwong skwong is offline
Member
 
Join Date: Apr 2013
Location: Hong Kong
Posts: 13
Default Re: *ANSWER* Q14 about linearly separable by SVM

Many many thanks to Yaser and Elroch.
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT -7. The time now is 03:23 PM.


Powered by vBulletin® Version 3.8.3
Copyright ©2000 - 2019, Jelsoft Enterprises Ltd.
The contents of this forum are to be used ONLY by readers of the Learning From Data book by Yaser S. Abu-Mostafa, Malik Magdon-Ismail, and Hsuan-Tien Lin, and participants in the Learning From Data MOOC by Yaser S. Abu-Mostafa. No part of these contents is to be communicated or made accessible to ANY other person or entity.