LFD Book Forum On the 'linear model I'',one step learning in not what linear...
 User Name Remember Me? Password
 Register FAQ Calendar Mark Forums Read

 Thread Tools Display Modes
#1
07-15-2012, 02:31 PM
 hesam.creative Junior Member Join Date: Jul 2012 Posts: 7
On the 'linear model I'',one step learning in not what linear...

hi,hope enjoy your life.(do you ever think why should you be on the earth instead of some other guy?!)

As the topic shows I want to spend some time arguing about this lecture.
ok,how should I say?well,on step learning is not what linear regression deserves. I strongly believe that linear regression has the same learning importance and beauty as PLA,and even more!
I will ,in details,advance my approach which is adopted from linear algebra.

first,in the simplest case,consider three distinct points in x-y plane.Our job is to find the best line approximates a non-existing line which passes through them.
now,I say that our first sparks of learning start in the process of finding the best line.For instance,one tests some line which pass through 2 points,one point or non of the points ,and realizes that to obtain the best line,he/she must come up with some error function that takes the error contributions of all points to account.So,we came up with least square error function,which elegantly handle the errors of all points,and more importantly its derivative is linear.
On the other hand,linear algebra says that the 'right hand side vector' namely b ,is not in vector space of the column space of A ,where A is a 3 by 3 matrix .( AX=b)
as predicted 3 distinct points do not fit on the same line.Linear algebra suggests that instead of b,we can approximate x,by introducing the error vector e,AX-b=e,and projecting b in the column space which is the closest point two the b and minimizes the error vector e.On important learning process is that the error vector is orthogonal to column space and this brilliant observation leads to find projection point and consequently the x^ solution which approximates the parameters of the best line.
in addition,I want two discuss about a beautiful learning approach to find the optimized line in linear regression,gradient descent algorithm which uses the calculus ideas to optimize the parameters of the line.This approach simply picks a random point at first,and find the deepest direction to take its little baby steps,until finally finds its local minimum. Actually when we concluded that e is orthogonal to column space intuitively,we used this optimization idea in one step by taking the partial derivatives of error function.
but what I believe is : before derivative we learn the gradient descent algorithm.
At the end of my discussion I want to talk about properties of the magical matrix A'A(A'=A transpose). Firstly, it is symmetric and more importantly squared. One magical property which Prof. also related in his lecture, is if A is a full rank matrix,its columns are independent,then A'A is invertible!,and since our data points are randomly chosen,A'A is virtually invertible.
here is its proof:
suppose A'Ax=o;
multiplying both sides by x',we've got (xA)' Ax=0;which suggests that Ax must be zero,and notice important fact that Ax is not zero for any x except x=0 vector!
QED.
I sincerely want to know your thought about this.
best regards.
#2
09-25-2012, 01:57 AM
 htlin NTU Join Date: Aug 2009 Location: Taipei, Taiwan Posts: 601
Re: On the 'linear model I'',one step learning in not what linear...

I'd like to share two pieces of thoughts on your view of linear regression.

(0) While linear regression can be described as one-step learning, calculating the pseudo inverse of (which is by ) generally contains multiple steps in the order of . So whether to call it one-step learning depends on how you view the process.

(1) While using the pseudo inverse is arguably the most common way of doing linear regression since the early years of Statistics, indeed it is not the only way. Applying gradient decent on linear regression, as you describe, can also work and would allow you to enjoy something similar to PLA (or logistic regression with gradient decent) --- making the line iteratively approach the optimal one.
__________________
When one teaches, two learn.

 Thread Tools Display Modes Linear Mode

 Posting Rules You may not post new threads You may not post replies You may not post attachments You may not edit your posts BB code is On Smilies are On [IMG] code is On HTML code is Off Forum Rules
 Forum Jump User Control Panel Private Messages Subscriptions Who's Online Search Forums Forums Home General     General Discussion of Machine Learning     Free Additional Material         Dynamic e-Chapters         Dynamic e-Appendices Course Discussions     Online LFD course         General comments on the course         Homework 1         Homework 2         Homework 3         Homework 4         Homework 5         Homework 6         Homework 7         Homework 8         The Final         Create New Homework Problems Book Feedback - Learning From Data     General comments on the book     Chapter 1 - The Learning Problem     Chapter 2 - Training versus Testing     Chapter 3 - The Linear Model     Chapter 4 - Overfitting     Chapter 5 - Three Learning Principles     e-Chapter 6 - Similarity Based Methods     e-Chapter 7 - Neural Networks     e-Chapter 8 - Support Vector Machines     e-Chapter 9 - Learning Aides     Appendix and Notation     e-Appendices

All times are GMT -7. The time now is 05:51 AM.

 Contact Us - LFD Book - Top

Powered by vBulletin® Version 3.8.3
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.
The contents of this forum are to be used ONLY by readers of the Learning From Data book by Yaser S. Abu-Mostafa, Malik Magdon-Ismail, and Hsuan-Tien Lin, and participants in the Learning From Data MOOC by Yaser S. Abu-Mostafa. No part of these contents is to be communicated or made accessible to ANY other person or entity.