I had bad luck with the ALA: for all but the smallest training data sets and with more than 2 dimensions, the weights would go scooting off to infinity.
I modified the algorithm so as to become a regression vs categorization problem, I changed the update criteria to be:
Code:
s = np.dot(x[i,:], w)
if np.abs(y[i] - s) > 0.01:
w = w + eta * (y[i] - s) * x[i,:]
n_updates += 1
This worked very well, with eta set to 0.1, for training sets of size N=1000 in d=10 dimensions required only 2.7 +/-1.1 iterations through the data to achieve the tolerance of 0.1 on every training data point. PLA on the same training data required about 750 iterations.
So rather than choosing a plane that separates the data, this chooses the plan that gets the correct distance (within the 0.01) between the plan and the data point for every training data point.