the original problem for hardmargin SVM seems like a "quadratic programming" problem  so my real question is this: why do we do the "dual" mapping to get the problem stated in terms of alpha? Is this purely to get it into a more convenient form for QP packages? I am missing something, but I don't know what :).

The number of variables in the original problem depends on the dimensionality of the weight space, whereas the number of variables in the dual problem does not. This makes a difference if that dimensionality is high, which is often the case in nonlinear transforms.