Originally Posted by invis View Post
Just one more question ... at this time

Why when we computing w_{lin} there is no eye matrix ? We do the same divide to w by Z^T Z w.

But eye matrix appears when we divide (Z^T Z w + \lambda w) by w. That confusing me.
The "divide by" is in fact multiplying by a matrix inverse. In the case of augmented error, in order to put things in terms of a matrix, we write Z^T Z w + \lambda w as Z^T Z w + \lambda I w, then factor out Z^T Z + \lambda I which becomes the matrix to be inverted. In the linear regression case, there is no I term (or you can think of it as \lambda=0 killing that term).
