Well, although with a bit of delay

, I did go over the proof. Sorry that it took this long but after the class ended over the summer, I became increasingly busy and I seemed to never find the time to read the proof.

I didn't understand each and every detail but I did understand the overall proof. I think that the level of rigor is about right, given that it's an optional section destined to those who are more mathematically inclined, so I wouldn't change that. Since the most ingenious trick is the idea of approximating

with

for a second data set, my only suggestion is to make the connection that that's exactly what we do in the homeworks, where we put that intuition to work. We estimated

out of a second set of randomly generated samples, which, if I understood well, is exactly what the trick is (then we averaged out over several runs which is also something that the proof does). The rest of the proof involves the introduction of several technical steps to reach to the final result, but I think that a correct understanding of Lemma A.2 is the key to understanding the overall proof. Since by the end of the course, the student has put the trick to work many times, it relates beautifully a theoretical derivation with the practical work done in the homeworks.

PS1: Of course, I wish a Happy and Prosperous New Year to everyone!

PS2: I was in LA during Christmas and I paid a visit to the Caltech campus. I would have loved to say hello, but since the campus was pretty much empty I assumed that there would be nobody to say hello to

.