Re: Impact of Alpha on PLA Converging
In my simulation, I plotted the random data points and the "true" f(x) line, and it helped me intuitively see that with many points (e.g. N=100), there is much less "wiggle room" for two lines to fit between the same set of "boundary" points (i.e. the set of points that are "closest" to the f(x) line.) With less points (say, N=10), there could be a huge variation in the slope and intercept of two lines that both "fit" the data.
I would think that you could get a better starting point than w = 0 by examining the data at the "boundary" points, where the result y changes from 1 to +1 and somehow use that information?
