LFD Book Forum Does the Group Invariance Theorem for all linear threshold functions?
 User Name Remember Me? Password
 FAQ Calendar Mark Forums Read

 Thread Tools Display Modes
#1
10-17-2012, 12:48 PM
 antics Junior Member Join Date: Apr 2012 Posts: 5
Does the Group Invariance Theorem for all linear threshold functions?

Late in the 60's Papert and Minsky wrote a very famous book called Perceptrons, in which they proved the Group Invariance Theorem, which shows that if the pointset is closed over the action of a mathematical group, then the output of a linear threshold classification with a weight vector learned from the perceptron algorithm will be invariant under the same group action if and only if the weights can be chosen to preserve the group.

This was historically devastating because it meant that you couldn't do things like learn to recognize whether there is an odd number of pixels turned on in an image, unless one of your features depended on all the points in your pointset. So this is a limitation of the Perceptron learning algorithm (as opposed to, say, feature selection).

One way to get around this is to use neural networks, which are capable of doing these sorts of things.

My question is this: is it known whether something similar hold for SVMs or logistic regression? Is this a limitation of any possible way to learn a linear threshold function, or can we get around it in some clever way?

I apologize if this is covered later in the couse; I haven't seen all the videos yet.
__________________
Every time you test someone, you change what they know.
#2
10-17-2012, 12:55 PM
 antics Junior Member Join Date: Apr 2012 Posts: 5
Re: Does the Group Invariance Theorem for all linear threshold functions?

Actually, I suspect this is a limitation only for linear functions.

Linear threshold functions which are invariant under the action of some mathematical group can be mapped to functions whose coefficients depend only on that group.

So, the only linear functions invariant under groups that are transitive (scaling, for example) end up being a measurement of size or area. But size and area do not necessarily preserve such transitive conditions, so building a distinguishing hyperplane invariant under such conditions is impossible, unless one of your features depends on all of the points.

So as long as your function is linear, this will hold. Right? If this is wrong please speak up. If it's right, the question is now: how do we learn order-2 (or, generically, order-p) functions, where this would not be a limitation? Neural nets are one obvious solution, but I'm having trouble mapping this order-p function fitting problem to the framework of neural networks. (2)
__________________
Every time you test someone, you change what they know.
#3
10-17-2012, 12:57 PM
 magdon RPI Join Date: Aug 2009 Location: Troy, NY, USA. Posts: 595
Re: Does the Group Invariance Theorem for all linear threshold functions?

These are limitations of the linear threshold function hypothesis set (or perceptrons). The various algorithms you mention just select a particular hypothesis in different ways (PLA, logistic regression, SVM). To be able to solve this problems, one has to move beyond linear threshold functions to nonlinear threshold functions. The nonlinear SVM can accomplish this, the neural network can accomplish this, etc.

Quote:
 Originally Posted by antics Late in the 60's Papert and Minsky wrote a very famous book called Perceptrons, in which they proved the Group Invariance Theorem, which shows that if the pointset is closed over the action of a mathematical group, then the output of a linear threshold classification with a weight vector learned from the perceptron algorithm will be invariant under the same group action if and only if the weights can be chosen to preserve the group. This was historically devastating because it meant that you couldn't do things like learn to recognize whether there is an odd number of pixels turned on in an image, unless one of your features depended on all the points in your pointset. So this is a limitation of the Perceptron learning algorithm (as opposed to, say, feature selection). One way to get around this is to use neural networks, which are capable of doing these sorts of things. My question is this: is it known whether something similar hold for SVMs or logistic regression? Is this a limitation of any possible way to learn a linear threshold function, or can we get around it in some clever way? I apologize if this is covered later in the couse; I haven't seen all the videos yet.
__________________
Have faith in probability

 Thread Tools Display Modes Linear Mode

 Posting Rules You may not post new threads You may not post replies You may not post attachments You may not edit your posts BB code is On Smilies are On [IMG] code is On HTML code is Off Forum Rules
 Forum Jump User Control Panel Private Messages Subscriptions Who's Online Search Forums Forums Home General     General Discussion of Machine Learning     Free Additional Material         Dynamic e-Chapters         Dynamic e-Appendices Course Discussions     Online LFD course         General comments on the course         Homework 1         Homework 2         Homework 3         Homework 4         Homework 5         Homework 6         Homework 7         Homework 8         The Final         Create New Homework Problems Book Feedback - Learning From Data     General comments on the book     Chapter 1 - The Learning Problem     Chapter 2 - Training versus Testing     Chapter 3 - The Linear Model     Chapter 4 - Overfitting     Chapter 5 - Three Learning Principles     e-Chapter 6 - Similarity Based Methods     e-Chapter 7 - Neural Networks     e-Chapter 8 - Support Vector Machines     e-Chapter 9 - Learning Aides     Appendix and Notation     e-Appendices

All times are GMT -7. The time now is 08:21 PM.

 Contact Us - LFD Book - Top

Powered by vBulletin® Version 3.8.3
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.
The contents of this forum are to be used ONLY by readers of the Learning From Data book by Yaser S. Abu-Mostafa, Malik Magdon-Ismail, and Hsuan-Tien Lin, and participants in the Learning From Data MOOC by Yaser S. Abu-Mostafa. No part of these contents is to be communicated or made accessible to ANY other person or entity.