LFD Book Forum  

Go Back   LFD Book Forum > Book Feedback - Learning From Data > Chapter 3 - The Linear Model

Thread Tools Display Modes
Old 07-15-2012, 03:31 PM
hesam.creative hesam.creative is offline
Junior Member
Join Date: Jul 2012
Posts: 7
Default On the 'linear model I'',one step learning in not what linear...

hi,hope enjoy your life.(do you ever think why should you be on the earth instead of some other guy?!)

As the topic shows I want to spend some time arguing about this lecture.
ok,how should I say?well,on step learning is not what linear regression deserves. I strongly believe that linear regression has the same learning importance and beauty as PLA,and even more!
I will ,in details,advance my approach which is adopted from linear algebra.

first,in the simplest case,consider three distinct points in x-y plane.Our job is to find the best line approximates a non-existing line which passes through them.
now,I say that our first sparks of learning start in the process of finding the best line.For instance,one tests some line which pass through 2 points,one point or non of the points ,and realizes that to obtain the best line,he/she must come up with some error function that takes the error contributions of all points to account.So,we came up with least square error function,which elegantly handle the errors of all points,and more importantly its derivative is linear.
On the other hand,linear algebra says that the 'right hand side vector' namely b ,is not in vector space of the column space of A ,where A is a 3 by 3 matrix .( AX=b)
as predicted 3 distinct points do not fit on the same line.Linear algebra suggests that instead of b,we can approximate x,by introducing the error vector e,AX-b=e,and projecting b in the column space which is the closest point two the b and minimizes the error vector e.On important learning process is that the error vector is orthogonal to column space and this brilliant observation leads to find projection point and consequently the x^ solution which approximates the parameters of the best line.
in addition,I want two discuss about a beautiful learning approach to find the optimized line in linear regression,gradient descent algorithm which uses the calculus ideas to optimize the parameters of the line.This approach simply picks a random point at first,and find the deepest direction to take its little baby steps,until finally finds its local minimum. Actually when we concluded that e is orthogonal to column space intuitively,we used this optimization idea in one step by taking the partial derivatives of error function.
but what I believe is : before derivative we learn the gradient descent algorithm.
At the end of my discussion I want to talk about properties of the magical matrix A'A(A'=A transpose). Firstly, it is symmetric and more importantly squared. One magical property which Prof. also related in his lecture, is if A is a full rank matrix,its columns are independent,then A'A is invertible!,and since our data points are randomly chosen,A'A is virtually invertible.
here is its proof:
suppose A'Ax=o;
multiplying both sides by x',we've got (xA)' Ax=0;which suggests that Ax must be zero,and notice important fact that Ax is not zero for any x except x=0 vector!
I sincerely want to know your thought about this.
best regards.
Reply With Quote
Old 09-25-2012, 02:57 AM
htlin's Avatar
htlin htlin is offline
Join Date: Aug 2009
Location: Taipei, Taiwan
Posts: 610
Default Re: On the 'linear model I'',one step learning in not what linear...

I'd like to share two pieces of thoughts on your view of linear regression.

(0) While linear regression can be described as one-step learning, calculating the pseudo inverse of \mathrm{X} (which is N by d+1) generally contains multiple steps in the order of O(poly(N, d)). So whether to call it one-step learning depends on how you view the process.

(1) While using the pseudo inverse is arguably the most common way of doing linear regression since the early years of Statistics, indeed it is not the only way. Applying gradient decent on linear regression, as you describe, can also work and would allow you to enjoy something similar to PLA (or logistic regression with gradient decent) --- making the line iteratively approach the optimal one.
When one teaches, two learn.
Reply With Quote

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

All times are GMT -7. The time now is 07:33 PM.

Powered by vBulletin® Version 3.8.3
Copyright ©2000 - 2022, Jelsoft Enterprises Ltd.
The contents of this forum are to be used ONLY by readers of the Learning From Data book by Yaser S. Abu-Mostafa, Malik Magdon-Ismail, and Hsuan-Tien Lin, and participants in the Learning From Data MOOC by Yaser S. Abu-Mostafa. No part of these contents is to be communicated or made accessible to ANY other person or entity.