LFD Book Forum Linear Regression - Statistics vs Data Mining
 Register FAQ Calendar Mark Forums Read

#1
04-11-2012, 11:23 AM
 ManUtd Junior Member Join Date: Apr 2012 Posts: 6
Linear Regression - Statistics vs Data Mining

Hi Professor Yaser/Everyone-

A question about looking at regression from a stats vs data mining angle.

Stats - checks for correlated variables, normality of residuals/variables (non-linear transformations probably take care of this), homoscedasticity etc.

Data Mining - as you had mentioned, we want to keep it general.

Does that mean -
a) we don't care about these assumptions or we do care, but they come into play later on.
b) we are at a higher risk for getting misleading results.

It would be nice to have your thoughts on this.

Thanks,
Kartik
#2
04-11-2012, 04:21 PM
 htlin NTU Join Date: Aug 2009 Location: Taipei, Taiwan Posts: 601
Re: Linear Regression - Statistics vs Data Mining

Quote:
 Originally Posted by ManUtd Hi Professor Yaser/Everyone- A question about looking at regression from a stats vs data mining angle. Stats - checks for correlated variables, normality of residuals/variables (non-linear transformations probably take care of this), homoscedasticity etc. Data Mining - as you had mentioned, we want to keep it general. Does that mean - a) we don't care about these assumptions or we do care, but they come into play later on. b) we are at a higher risk for getting misleading results. It would be nice to have your thoughts on this. Thanks, Kartik
IMHO, from an ML perspective, we care about the out-of-sample performance. Different levels of assumptions gives you different guarantees on the out-of-sample performance. Statisticians (mathematicians) are often willing to go for more assumptions to get more decent mathematical guarantees, including out-of-sample performance and other measures of mathematical interests. The problem with assumptions is that they may or may not be realistic. So understanding the guarantee when using the least assumptions is important and is an angle we are presenting in the book.
__________________
When one teaches, two learn.

 Thread Tools Display Modes Linear Mode

 Posting Rules You may not post new threads You may not post replies You may not post attachments You may not edit your posts BB code is On Smilies are On [IMG] code is On HTML code is Off Forum Rules
 Forum Jump User Control Panel Private Messages Subscriptions Who's Online Search Forums Forums Home General     General Discussion of Machine Learning     Free Additional Material         Dynamic e-Chapters         Dynamic e-Appendices Course Discussions     Online LFD course         General comments on the course         Homework 1         Homework 2         Homework 3         Homework 4         Homework 5         Homework 6         Homework 7         Homework 8         The Final         Create New Homework Problems Book Feedback - Learning From Data     General comments on the book     Chapter 1 - The Learning Problem     Chapter 2 - Training versus Testing     Chapter 3 - The Linear Model     Chapter 4 - Overfitting     Chapter 5 - Three Learning Principles     e-Chapter 6 - Similarity Based Methods     e-Chapter 7 - Neural Networks     e-Chapter 8 - Support Vector Machines     e-Chapter 9 - Learning Aides     Appendix and Notation     e-Appendices

All times are GMT -7. The time now is 04:36 AM.