
#1




On the 'linear model I'',one step learning in not what linear...
hi,hope enjoy your life.(do you ever think why should you be on the earth instead of some other guy?!)
As the topic shows I want to spend some time arguing about this lecture. ok,how should I say?well,on step learning is not what linear regression deserves. I strongly believe that linear regression has the same learning importance and beauty as PLA,and even more! I will ,in details,advance my approach which is adopted from linear algebra. first,in the simplest case,consider three distinct points in xy plane.Our job is to find the best line approximates a nonexisting line which passes through them. now,I say that our first sparks of learning start in the process of finding the best line.For instance,one tests some line which pass through 2 points,one point or non of the points ,and realizes that to obtain the best line,he/she must come up with some error function that takes the error contributions of all points to account.So,we came up with least square error function,which elegantly handle the errors of all points,and more importantly its derivative is linear. On the other hand,linear algebra says that the 'right hand side vector' namely b ,is not in vector space of the column space of A ,where A is a 3 by 3 matrix .( AX=b) as predicted 3 distinct points do not fit on the same line.Linear algebra suggests that instead of b,we can approximate x,by introducing the error vector e,AXb=e,and projecting b in the column space which is the closest point two the b and minimizes the error vector e.On important learning process is that the error vector is orthogonal to column space and this brilliant observation leads to find projection point and consequently the x^ solution which approximates the parameters of the best line. in addition,I want two discuss about a beautiful learning approach to find the optimized line in linear regression,gradient descent algorithm which uses the calculus ideas to optimize the parameters of the line.This approach simply picks a random point at first,and find the deepest direction to take its little baby steps,until finally finds its local minimum. Actually when we concluded that e is orthogonal to column space intuitively,we used this optimization idea in one step by taking the partial derivatives of error function. but what I believe is : before derivative we learn the gradient descent algorithm. At the end of my discussion I want to talk about properties of the magical matrix A'A(A'=A transpose). Firstly, it is symmetric and more importantly squared. One magical property which Prof. also related in his lecture, is if A is a full rank matrix,its columns are independent,then A'A is invertible!,and since our data points are randomly chosen,A'A is virtually invertible. here is its proof: suppose A'Ax=o; multiplying both sides by x',we've got (xA)' Ax=0;which suggests that Ax must be zero,and notice important fact that Ax is not zero for any x except x=0 vector! QED. I sincerely want to know your thought about this. best regards. 
#2




Re: On the 'linear model I'',one step learning in not what linear...
I'd like to share two pieces of thoughts on your view of linear regression.
(0) While linear regression can be described as onestep learning, calculating the pseudo inverse of (which is by ) generally contains multiple steps in the order of . So whether to call it onestep learning depends on how you view the process. (1) While using the pseudo inverse is arguably the most common way of doing linear regression since the early years of Statistics, indeed it is not the only way. Applying gradient decent on linear regression, as you describe, can also work and would allow you to enjoy something similar to PLA (or logistic regression with gradient decent)  making the line iteratively approach the optimal one.
__________________
When one teaches, two learn. 
Thread Tools  
Display Modes  

