View Single Post
  #1  
Old 02-17-2013, 12:45 PM
Haowen Haowen is offline
Member
 
Join Date: Jan 2013
Posts: 24
Default weight decay and data normalization *not a homework question*

I have a general question regarding weight decay regularization.

Since w_0 is a component inside the regularization term, it looks like it is possible to trade off distance from the origin for model complexity, e.g., I can have more complex models closer to the origin.

For this to make intuitive sense so that the regularization correctly "charges" the hypothesis a cost for being more complex, it seems to me that all the features must be normalized to have zero mean. Otherwise for example if all the data points are in a ball far from the origin, regularization could fail in the sense that a "good" classifier would have w_0 large and all other w small, but potentially a poor (overfitting) classifier could have w_0 small and other w large and achieve the same regularization cost.

I'm not sure about this reasoning, is it correct? Is this a concern in practice? Thanks!
Reply With Quote