LFD Book Forum  

Go Back   LFD Book Forum > Book Feedback - Learning From Data > Chapter 3 - The Linear Model

Reply
 
Thread Tools Display Modes
  #1  
Old 04-16-2012, 10:58 AM
tcristo tcristo is offline
Member
 
Join Date: Apr 2012
Posts: 23
Default LRA -> PLA Effect of Alpha

I noticed that when running the Linear Regression on a training data set followed by running the PLA using the same data and LRA weights, that the Learning Rate (Alpha) of the PLA seems to significantly effect the rate of convergence. I am assuming that the optimal size of alpha is directly related to the size of the convergence errors from the Linear Regression.

Is there a way to model this mathematically such that the Alpha parameter can automatically be calculated for each training set?
Reply With Quote
  #2  
Old 04-16-2012, 01:50 PM
htlin's Avatar
htlin htlin is offline
NTU
 
Join Date: Aug 2009
Location: Taipei, Taiwan
Posts: 601
Default Re: LRA -> PLA Effect of Alpha

Quote:
Originally Posted by tcristo View Post
I noticed that when running the Linear Regression on a training data set followed by running the PLA using the same data and LRA weights, that the Learning Rate (Alpha) of the PLA seems to significantly effect the rate of convergence. I am assuming that the optimal size of alpha is directly related to the size of the convergence errors from the Linear Regression.

Is there a way to model this mathematically such that the Alpha parameter can automatically be calculated for each training set?
For PLA, I cannot recall any. For some more general models like Neural Networks, there are efforts (in terms of optimization) for adaptively changing the \alpha value. BTW, I think the homework problem asks you to take no \alpha (or a naive choice of 1) Hope this helps.
__________________
When one teaches, two learn.
Reply With Quote
  #3  
Old 04-16-2012, 02:32 PM
tcristo tcristo is offline
Member
 
Join Date: Apr 2012
Posts: 23
Default Re: LRA -> PLA Effect of Alpha

Quote:
Originally Posted by htlin View Post
For PLA, I cannot recall any. For some more general models like Neural Networks, there are efforts (in terms of optimization) for adaptively changing the \alpha value. BTW, I think the homework problem asks you to take no \alpha (or a naive choice of 1) Hope this helps.
I originally had my \alpha set at one. I was surprised that running the LRA first to preset the weights and then running the PLA didn't significantly decrease the number of iterations required. I am getting a 50% reduction or thereabouts and expected an order of magnitude reduction. When you view it graphically, the LRA does what seems like 98+% of the work most of the time.

The size of alpha doesn't always seem to matter but there are specific cases of where the appropriately assigned \alpha is able to drop the number of iterations down by an additional 50%-75%.

I am going to chew on this for a little while and see if I can figure out the relationship.
Reply With Quote
  #4  
Old 04-16-2012, 07:45 PM
jsarrett jsarrett is offline
Member
 
Join Date: Apr 2012
Location: Sunland, CA
Posts: 13
Default Re: LRA -> PLA Effect of Alpha

No one ever said the PLA was a *good* algorithm. It's only guaranteed to converge eventually. I'm sure later in the lecture we'll get to better optimization algorithms.
Reply With Quote
  #5  
Old 02-12-2013, 02:27 AM
gah44 gah44 is offline
Invited Guest
 
Join Date: Jul 2012
Location: Seattle, WA
Posts: 153
Default Re: LRA -> PLA Effect of Alpha

Quote:
Originally Posted by tcristo View Post
I originally had my \alpha set at one. I was surprised that running the LRA first to preset the weights and then running the PLA didn't significantly decrease the number of iterations required. I am getting a 50% reduction or thereabouts and expected an order of magnitude reduction. When you view it graphically, the LRA does what seems like 98+% of the work most of the time.

(snip)
I wondered about this in the class discussion, but I only noticed this one now.

As the problem is done with \alpha=1 then, as you note, the effect is small. What it seems is that if the LRA solution correctly classifies the points, then no cycles of PLA are used, otherwise just about as many as before. The 50% is the cases where no cycles of PLA are used.
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT -7. The time now is 11:49 AM.


Powered by vBulletin® Version 3.8.3
Copyright ©2000 - 2019, Jelsoft Enterprises Ltd.
The contents of this forum are to be used ONLY by readers of the Learning From Data book by Yaser S. Abu-Mostafa, Malik Magdon-Ismail, and Hsuan-Tien Lin, and participants in the Learning From Data MOOC by Yaser S. Abu-Mostafa. No part of these contents is to be communicated or made accessible to ANY other person or entity.