I have a general question regarding weight decay regularization.

Since

is a component inside the regularization term, it looks like it is possible to trade off distance from the origin for model complexity, e.g., I can have more complex models closer to the origin.

For this to make intuitive sense so that the regularization correctly "charges" the hypothesis a cost for being more complex, it seems to me that all the features must be normalized to have zero mean. Otherwise for example if all the data points are in a ball far from the origin, regularization could fail in the sense that a "good" classifier would have

large and all other w small, but potentially a poor (overfitting) classifier could have

small and other w large and achieve the same regularization cost.

I'm not sure about this reasoning, is it correct? Is this a concern in practice? Thanks!