Quote:
Originally Posted by ManUtd
Hi Professor Yaser/Everyone
A question about looking at regression from a stats vs data mining angle.
Stats  checks for correlated variables, normality of residuals/variables (nonlinear transformations probably take care of this), homoscedasticity etc.
Data Mining  as you had mentioned, we want to keep it general.
Does that mean 
a) we don't care about these assumptions or we do care, but they come into play later on.
b) we are at a higher risk for getting misleading results.
It would be nice to have your thoughts on this.
Thanks,
Kartik

IMHO, from an ML perspective, we care about the outofsample performance. Different levels of assumptions gives you different guarantees on the outofsample performance. Statisticians (mathematicians) are often willing to go for more assumptions to get more decent mathematical guarantees, including outofsample performance and other measures of mathematical interests. The problem with assumptions is that they may or may not be realistic. So understanding the guarantee when using the least assumptions is important and is an angle we are presenting in the book.