LFD Book Forum

LFD Book Forum (http://book.caltech.edu/bookforum/index.php)
-   General Discussion of Machine Learning (http://book.caltech.edu/bookforum/forumdisplay.php?f=105)
-   -   Q: MacKay's method of setting weight decay (http://book.caltech.edu/bookforum/showthread.php?t=4742)

rakhlin 02-05-2017 05:47 AM

Q: MacKay's method of setting weight decay
 
Dear all, dear Professors,

I am trying to implement David MacKay's method of setting l2 regularizer in neural nets without need of cross-validation. It is briefly described in Geoffrey Hinton's lecture 5 and 6 of week 9, and in original papers (you can find all on the net):

MacKay - A Practical Bayesian Framework for Backprop Networks
MacKay - The Evidence Framework Applied to Classification Networks

Approximation for regression problem is \lambda=\frac{\alpha}{\beta}=\frac{\sigma^2_D}{\sigma^2_W} (Hinton + MacKay 'A Practical Bayesian Framework for Backprop Networks', p.18),
where \alpha=\frac{1}{2\sigma^2_W}, \beta=\frac{1}{2\sigma^2_D} (standard deviations of weights and residuals)

The question is there similar approximation for classification problem?

MacKay's 'The Evidence Framework Applied to Classification Networks', p.3 postulates the two frameworks are identical with only exception that \beta*E_D term is absent in classification task, and \alpha*E_W the same.

\beta*E_D is replaced with G, and I don't understand how to substitute \beta in approximation above.

Thank you in advance for your help!


All times are GMT -7. The time now is 01:56 PM.

Powered by vBulletin® Version 3.8.3
Copyright ©2000 - 2019, Jelsoft Enterprises Ltd.
The contents of this forum are to be used ONLY by readers of the Learning From Data book by Yaser S. Abu-Mostafa, Malik Magdon-Ismail, and Hsuan-Tien Lin, and participants in the Learning From Data MOOC by Yaser S. Abu-Mostafa. No part of these contents is to be communicated or made accessible to ANY other person or entity.