LFD Book Forum (http://book.caltech.edu/bookforum/index.php)
-   Chapter 1 - The Learning Problem (http://book.caltech.edu/bookforum/forumdisplay.php?f=108)
-   -   Does the Group Invariance Theorem for all linear threshold functions? (http://book.caltech.edu/bookforum/showthread.php?t=2203)

 antics 10-17-2012 12:48 PM

Does the Group Invariance Theorem for all linear threshold functions?

Late in the 60's Papert and Minsky wrote a very famous book called Perceptrons, in which they proved the Group Invariance Theorem, which shows that if the pointset is closed over the action of a mathematical group, then the output of a linear threshold classification with a weight vector learned from the perceptron algorithm will be invariant under the same group action if and only if the weights can be chosen to preserve the group.

This was historically devastating because it meant that you couldn't do things like learn to recognize whether there is an odd number of pixels turned on in an image, unless one of your features depended on all the points in your pointset. So this is a limitation of the Perceptron learning algorithm (as opposed to, say, feature selection).

One way to get around this is to use neural networks, which are capable of doing these sorts of things.

My question is this: is it known whether something similar hold for SVMs or logistic regression? Is this a limitation of any possible way to learn a linear threshold function, or can we get around it in some clever way?

I apologize if this is covered later in the couse; I haven't seen all the videos yet.

 antics 10-17-2012 12:55 PM

Re: Does the Group Invariance Theorem for all linear threshold functions?

Actually, I suspect this is a limitation only for linear functions.

Linear threshold functions which are invariant under the action of some mathematical group can be mapped to functions whose coefficients depend only on that group.

So, the only linear functions invariant under groups that are transitive (scaling, for example) end up being a measurement of size or area. But size and area do not necessarily preserve such transitive conditions, so building a distinguishing hyperplane invariant under such conditions is impossible, unless one of your features depends on all of the points.

So as long as your function is linear, this will hold. Right? If this is wrong please speak up. If it's right, the question is now: how do we learn order-2 (or, generically, order-p) functions, where this would not be a limitation? Neural nets are one obvious solution, but I'm having trouble mapping this order-p function fitting problem to the framework of neural networks. (2)

 magdon 10-17-2012 12:57 PM

Re: Does the Group Invariance Theorem for all linear threshold functions?

These are limitations of the linear threshold function hypothesis set (or perceptrons). The various algorithms you mention just select a particular hypothesis in different ways (PLA, logistic regression, SVM). To be able to solve this problems, one has to move beyond linear threshold functions to nonlinear threshold functions. The nonlinear SVM can accomplish this, the neural network can accomplish this, etc.

Quote:
 Originally Posted by antics (Post 6459) Late in the 60's Papert and Minsky wrote a very famous book called Perceptrons, in which they proved the Group Invariance Theorem, which shows that if the pointset is closed over the action of a mathematical group, then the output of a linear threshold classification with a weight vector learned from the perceptron algorithm will be invariant under the same group action if and only if the weights can be chosen to preserve the group. This was historically devastating because it meant that you couldn't do things like learn to recognize whether there is an odd number of pixels turned on in an image, unless one of your features depended on all the points in your pointset. So this is a limitation of the Perceptron learning algorithm (as opposed to, say, feature selection). One way to get around this is to use neural networks, which are capable of doing these sorts of things. My question is this: is it known whether something similar hold for SVMs or logistic regression? Is this a limitation of any possible way to learn a linear threshold function, or can we get around it in some clever way? I apologize if this is covered later in the couse; I haven't seen all the videos yet.

 All times are GMT -7. The time now is 10:42 AM.