 LFD Book Forum Exchange of expectations in derivation of bias-variance decomposition

#1
 kynnjo Junior Member Join Date: May 2013 Posts: 2 Exchange of expectations in derivation of bias-variance decomposition

[I originally posted this question to the wrong sub-forum. My apologies.]

In the derivation of the bias-variance decomposition (on p. 63), there is a step in which the taking of expectation wrt \mathcal{D} and wrt \mathbf{x} are exchanged.

It's not clear to me that these two expectations commute: the choice of \mathbf{x} depends on the choice of \mathcal{D}, and viceversa.

I would appreciate a clarification on this point.

kj

P.S. Pardon the "raw LaTeX" above. Is there a better way to include mathematical notation in these posts.
#2
 Elroch Invited Guest Join Date: Mar 2013 Posts: 143 Re: Exchange of expectations in derivation of bias-variance decomposition

With regard to LaTex, you just need to surround your LaTex code with (math) and (/math), except that those brackets need to be square.

With regard to the exchange of expectations, this can always be done with probability distributions. The condition that makes it easier than swapping the order of integrations and sums in general is that probability densities are always positive. Problems with swapping the order of integrals and sums only occur when you have conditional convergence, with infinite positive and negative contributions cancelling out in a way which is order-dependent.

[EDIT: thanks to Yaser for being more precise than me. In this case it's not merely that probability distributions are positive that matters, it's also that the error function being integrated is non-negative. Without this, the change of order could be invalid if the function was pathological. It would be safe if the function was Lebesgue measurable].
#3 yaser Caltech Join Date: Aug 2009 Location: Pasadena, California, USA Posts: 1,477 Re: Exchange of expectations in derivation of bias-variance decomposition

Quote:
 Originally Posted by kynnjo [I originally posted this question to the wrong sub-forum. My apologies.] In the derivation of the bias-variance decomposition (on p. 63), there is a step in which the taking of expectation wrt \mathcal{D} and wrt \mathbf{x} are exchanged. It's not clear to me that these two expectations commute: the choice of \mathbf{x} depends on the choice of \mathcal{D}, and viceversa. I would appreciate a clarification on this point. Thanks in advance, kj P.S. Pardon the "raw LaTeX" above. Is there a better way to include mathematical notation in these posts.
It is not independence that allows us to change the order of integration. It is the fact that the integrand is always nonnegative. Think of it as a double summation. You are adding up the same set of numbers whether you start with one sum or the other. The problem arises when some of these numbers are positive and some are negative, since in that case you can play tricks with different orders of the summation to converge to different values. Look up "absolute convergence" versus "conditional convergence."
__________________
Where everyone thinks alike, no one thinks very much

 Thread Tools Show Printable Version Email this Page Display Modes Linear Mode Switch to Hybrid Mode Switch to Threaded Mode Posting Rules You may not post new threads You may not post replies You may not post attachments You may not edit your posts BB code is On Smilies are On [IMG] code is On HTML code is Off Forum Rules
 Forum Jump User Control Panel Private Messages Subscriptions Who's Online Search Forums Forums Home General     General Discussion of Machine Learning     Free Additional Material         Dynamic e-Chapters         Dynamic e-Appendices Course Discussions     Online LFD course         General comments on the course         Homework 1         Homework 2         Homework 3         Homework 4         Homework 5         Homework 6         Homework 7         Homework 8         The Final         Create New Homework Problems Book Feedback - Learning From Data     General comments on the book     Chapter 1 - The Learning Problem     Chapter 2 - Training versus Testing     Chapter 3 - The Linear Model     Chapter 4 - Overfitting     Chapter 5 - Three Learning Principles     e-Chapter 6 - Similarity Based Methods     e-Chapter 7 - Neural Networks     e-Chapter 8 - Support Vector Machines     e-Chapter 9 - Learning Aides     Appendix and Notation     e-Appendices

All times are GMT -7. The time now is 01:15 PM. The contents of this forum are to be used ONLY by readers of the Learning From Data book by Yaser S. Abu-Mostafa, Malik Magdon-Ismail, and Hsuan-Tien Lin, and participants in the Learning From Data MOOC by Yaser S. Abu-Mostafa. No part of these contents is to be communicated or made accessible to ANY other person or entity.