I've been kind of saving this question, but decided to ask at this point.

Why is there no mention of residual analysis in any of the linear regression topics the course has covered? How does residual analysis fit into the data learning picture (if it fits in at all)?

Specifically: starting with this week's topic of regularization, we've seen how weight decay softens the weights, but in doing so, chages them from the normal weights you'd obtain in linear regression. I would imagine that with weight decay, it would no longer hold that the mean of the errors (as in linear regression errors:

) is equal to zero, so the residuals would not be normally distributed with same variance and zero mean. In other words, with weight decay at least one of the Gauss-Markov assumptions do not hold?

Does that matter?

In general, are the standard tools of linear regression analysis we were taught in school (looking at the determination coefficient, hypothesis testing on the significance of the coefficients, and residual analysis to see if the assumptions that back up the previous elements hold) entirely pointless when you're doing machine learning?