Just one more question ... at this time
Why when we computing there is no eye matrix ? We do the same divide to w by .
But eye matrix appears when we divide by w. That confusing me.

The "divide by" is in fact multiplying by a matrix inverse. In the case of augmented error, in order to put things in terms of a matrix, we write
as
, then factor out
which becomes the matrix to be inverted. In the linear regression case, there is no
term (or you can think of it as
killing that term).