From coursera's ML course I've known that the normal equation is calculated as follows:
pinv((X'*X))*X'*Y; (octave code) but apparently this is equivalent to just pinv(X)*Y; Can anyone explain why this is the case? 
Dr. Ng derived the Normal Equation in class, see Lecture 46 and he also cautioned about the case where X'*X is noninvertible which meant that there were redundant features (linearly dependent) or too many features (m <= n).
Thanks you all for useful input. We have some reading to do... :)

