It is a mistake to talk about

converging. It is the vector
w that converges.
w should not be
normalized after each update because doing so alters the relative scale of the error adjustments performed with each iteration. I suspect that this could result in cases where convergence would fail to occur even for a linearly separable training vector.