LFD Book Forum  

Go Back   LFD Book Forum > General > General Discussion of Machine Learning

 
 
Thread Tools Display Modes
Prev Previous Post   Next Post Next
  #1  
Old 10-05-2013, 01:15 AM
hsolo hsolo is offline
Member
 
Join Date: Jul 2013
Posts: 12
Default Feature dimensionality, regularization and generalization

I had a couple of conceptual questions:

The VC result and Bias Variance result imply that if the number of features is very large then unless the number of training samples is high there is the sceptre of overfitting. So there is the requirement that feature selection has to be done systematically and carefully.

However it seems that if one uses regularization in some form then that can serve as a generic antidote to overfitting; and consequently one can ignore the feature dimensionality (assuming for a moment that the computing overhead of large feature set can be ignored) -- I got that impression from online notes from a couple of courses and I also saw in a recent Google paper that they used logistic regression with regularization on a billion-dimension (highly sparse) feature set..

Is this a correct notion from a statistics that if one uses regularization and is willing to pay the computing costs, one can be lax about feature selection?

Is there a theoretical result about the above notion (feature dimensionality and regularization effect on generalization error)?
Reply With Quote
 

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT -7. The time now is 08:36 AM.


Powered by vBulletin® Version 3.8.3
Copyright ©2000 - 2019, Jelsoft Enterprises Ltd.
The contents of this forum are to be used ONLY by readers of the Learning From Data book by Yaser S. Abu-Mostafa, Malik Magdon-Ismail, and Hsuan-Tien Lin, and participants in the Learning From Data MOOC by Yaser S. Abu-Mostafa. No part of these contents is to be communicated or made accessible to ANY other person or entity.