Out of syllabus question on Regularization vs Priors
Since taking this course in Summer 2012, I have tried to read up more about regularization and found out that there are different approaches. The relatively more commonly used are L1 and L2 (covered in class under the name of 'weightdecay') regularization.
There appears to be some mathematical equivalence between using regularization and the usage of prior probabilities (in the Bayesian approach). From what I understand, imposing an L2 penalty is same as imposing a Gaussian prior assumption on the unknown weights. Similarly L1 corresponds to imposing a Laplacian prior.
In the concluding lectures, Professor YAM mentioned that we have to be careful in verifying that our assumptions on priors are valid when going with the Bayesian approach.
If my understanding is correct, the "danger" introduced in choosing priors is identically (mathematically) to the "danger"" introduced by choosing some arbitrary regularization technique. In other words, we have to be equally careful about using the right regularization technique as we need to be about choosing the right prior.
Is my understanding correct? In other words, does the Bayesian approach particularly warrant any more caution, or both approaches warrant the same amount/kind of caution?
PS: For future versions of the class, it would be great if another lecture is added to introduce various regularization techniques since in practice it appears that L1 is being used everywhere "big data" for its sparsity benefits.
