View Single Post
  #1  
Old 02-05-2017, 05:47 AM
rakhlin rakhlin is offline
Member
 
Join Date: Jun 2012
Posts: 24
Question Q: MacKay's method of setting weight decay

Dear all, dear Professors,

I am trying to implement David MacKay's method of setting l2 regularizer in neural nets without need of cross-validation. It is briefly described in Geoffrey Hinton's lecture 5 and 6 of week 9, and in original papers (you can find all on the net):

MacKay - A Practical Bayesian Framework for Backprop Networks
MacKay - The Evidence Framework Applied to Classification Networks

Approximation for regression problem is \lambda=\frac{\alpha}{\beta}=\frac{\sigma^2_D}{\sigma^2_W} (Hinton + MacKay 'A Practical Bayesian Framework for Backprop Networks', p.18),
where \alpha=\frac{1}{2\sigma^2_W}, \beta=\frac{1}{2\sigma^2_D} (standard deviations of weights and residuals)

The question is there similar approximation for classification problem?

MacKay's 'The Evidence Framework Applied to Classification Networks', p.3 postulates the two frameworks are identical with only exception that \beta*E_D term is absent in classification task, and \alpha*E_W the same.

\beta*E_D is replaced with G, and I don't understand how to substitute \beta in approximation above.

Thank you in advance for your help!
Reply With Quote