It is only analytically optimal for regression. It can be suboptimal for classification.
I don't quite understand the first classification method given by the problem: "Linear Regression for classiﬁcation followed by pocket for improvement". Since the weight returned by linear regression is an analytically optimal result, how can the pocket algorithm improve it?
