LFD Book Forum  

Go Back   LFD Book Forum > Book Feedback - Learning From Data > Chapter 1 - The Learning Problem

Reply
 
Thread Tools Display Modes
  #1  
Old 10-17-2012, 01:48 PM
antics antics is offline
Junior Member
 
Join Date: Apr 2012
Posts: 5
Default Does the Group Invariance Theorem for all linear threshold functions?

Late in the 60's Papert and Minsky wrote a very famous book called Perceptrons, in which they proved the Group Invariance Theorem, which shows that if the pointset is closed over the action of a mathematical group, then the output of a linear threshold classification with a weight vector learned from the perceptron algorithm will be invariant under the same group action if and only if the weights can be chosen to preserve the group.

This was historically devastating because it meant that you couldn't do things like learn to recognize whether there is an odd number of pixels turned on in an image, unless one of your features depended on all the points in your pointset. So this is a limitation of the Perceptron learning algorithm (as opposed to, say, feature selection).

One way to get around this is to use neural networks, which are capable of doing these sorts of things.

My question is this: is it known whether something similar hold for SVMs or logistic regression? Is this a limitation of any possible way to learn a linear threshold function, or can we get around it in some clever way?

I apologize if this is covered later in the couse; I haven't seen all the videos yet.
__________________
Every time you test someone, you change what they know.
Reply With Quote
  #2  
Old 10-17-2012, 01:55 PM
antics antics is offline
Junior Member
 
Join Date: Apr 2012
Posts: 5
Default Re: Does the Group Invariance Theorem for all linear threshold functions?

Actually, I suspect this is a limitation only for linear functions.

Linear threshold functions which are invariant under the action of some mathematical group can be mapped to functions whose coefficients depend only on that group.

So, the only linear functions invariant under groups that are transitive (scaling, for example) end up being a measurement of size or area. But size and area do not necessarily preserve such transitive conditions, so building a distinguishing hyperplane invariant under such conditions is impossible, unless one of your features depends on all of the points.

So as long as your function is linear, this will hold. Right? If this is wrong please speak up. If it's right, the question is now: how do we learn order-2 (or, generically, order-p) functions, where this would not be a limitation? Neural nets are one obvious solution, but I'm having trouble mapping this order-p function fitting problem to the framework of neural networks. (2)
__________________
Every time you test someone, you change what they know.
Reply With Quote
  #3  
Old 10-17-2012, 01:57 PM
magdon's Avatar
magdon magdon is offline
RPI
 
Join Date: Aug 2009
Location: Troy, NY, USA.
Posts: 595
Default Re: Does the Group Invariance Theorem for all linear threshold functions?

These are limitations of the linear threshold function hypothesis set (or perceptrons). The various algorithms you mention just select a particular hypothesis in different ways (PLA, logistic regression, SVM). To be able to solve this problems, one has to move beyond linear threshold functions to nonlinear threshold functions. The nonlinear SVM can accomplish this, the neural network can accomplish this, etc.

Quote:
Originally Posted by antics View Post
Late in the 60's Papert and Minsky wrote a very famous book called Perceptrons, in which they proved the Group Invariance Theorem, which shows that if the pointset is closed over the action of a mathematical group, then the output of a linear threshold classification with a weight vector learned from the perceptron algorithm will be invariant under the same group action if and only if the weights can be chosen to preserve the group.

This was historically devastating because it meant that you couldn't do things like learn to recognize whether there is an odd number of pixels turned on in an image, unless one of your features depended on all the points in your pointset. So this is a limitation of the Perceptron learning algorithm (as opposed to, say, feature selection).

One way to get around this is to use neural networks, which are capable of doing these sorts of things.

My question is this: is it known whether something similar hold for SVMs or logistic regression? Is this a limitation of any possible way to learn a linear threshold function, or can we get around it in some clever way?

I apologize if this is covered later in the couse; I haven't seen all the videos yet.
__________________
Have faith in probability
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT -7. The time now is 04:12 AM.


Powered by vBulletin® Version 3.8.3
Copyright ©2000 - 2019, Jelsoft Enterprises Ltd.
The contents of this forum are to be used ONLY by readers of the Learning From Data book by Yaser S. Abu-Mostafa, Malik Magdon-Ismail, and Hsuan-Tien Lin, and participants in the Learning From Data MOOC by Yaser S. Abu-Mostafa. No part of these contents is to be communicated or made accessible to ANY other person or entity.