#1




Regression then PLA
In the Homework 1 section there is discussion showing that PLA is
independent of alpha, as only the ratio to w0 counts. It seems, though, when using regression before PLA, as initial weights for the PLA, that alpha is important again. Too big, and the initial weights don't do much, too small and it converges slow. 
#2




Re: Regression then PLA
Quote:
__________________
Where everyone thinks alike, no one thinks very much 
#3




Re: Regression then PLA
I'm coming up against this as well. Maybe I have a bug, but I'm finding that even though the regression finds an almost perfect line with, usually, very few points misclassified, I give the weights from the regression to PLA as initial weights and the PLA line bounces all over the place before settling down.
Scaling the regression weights up by a factor of 10 or 100 would speed up the PLA a lot, I think, by preventing the PLA update from moving the weights so much. That would have a similar effect to using a small alpha. But we're not supposed to do either thing, right? 
#4




Re: Regression then PLA
Quote:
__________________
Where everyone thinks alike, no one thinks very much 
#5




Re: Regression then PLA
Seems to me that what it does is either give 0 iterations (if none are misclassified) or about as many as it did without the regression solution.
So what we are really counting is how often it gives 0 iterations, but yes, we have to follow the problem statement. 
#6




Re: Regression then PLA
Huh  I haven't done this problem yet, but this is really interesting. I had thought the answer to this question would be, like, 1, but now I see that the first step is likely to ruin the advantage that the linear regression gave you. The PLA isn't initially very subtle; the weights eventually get big and you have to wait till the size of the adjustments becomes small relative to the weights for the fine tuning.

#7




Re: Regression then PLA
Quote:
For 10 data points, it might very well converge quickly! My simulation converges at the speed of light! 1000 data points is another story. 
Thread Tools  
Display Modes  

