Just a quick remark to clarify things, and I'll let others discuss other parts. When we look for a break point, the constellations of input points that we consider are not restricted to being separable. In fact, the points are not labelled

a priori, so some

patterns may make them separable and others may not. The fact that perceptrons and PLA work on separable data does not affect this, as a break point by definition is based on where a model fails to separate the points.