LFD Book Forum  

Go Back   LFD Book Forum > General > General Discussion of Machine Learning

Reply
 
Thread Tools Display Modes
  #1  
Old 10-05-2013, 12:15 AM
hsolo hsolo is offline
Member
 
Join Date: Jul 2013
Posts: 12
Default Feature dimensionality, regularization and generalization

I had a couple of conceptual questions:

The VC result and Bias Variance result imply that if the number of features is very large then unless the number of training samples is high there is the sceptre of overfitting. So there is the requirement that feature selection has to be done systematically and carefully.

However it seems that if one uses regularization in some form then that can serve as a generic antidote to overfitting; and consequently one can ignore the feature dimensionality (assuming for a moment that the computing overhead of large feature set can be ignored) -- I got that impression from online notes from a couple of courses and I also saw in a recent Google paper that they used logistic regression with regularization on a billion-dimension (highly sparse) feature set..

Is this a correct notion from a statistics that if one uses regularization and is willing to pay the computing costs, one can be lax about feature selection?

Is there a theoretical result about the above notion (feature dimensionality and regularization effect on generalization error)?
Reply With Quote
  #2  
Old 10-05-2013, 06:02 AM
magdon's Avatar
magdon magdon is offline
RPI
 
Join Date: Aug 2009
Location: Troy, NY, USA.
Posts: 595
Default Re: Feature dimensionality, regularization and generalization

This is a very important point you raise. Feature selection and regularization play different roles.

Feature selection is used to construct the `right' input that is useful for predicting the output. With respect to the right features, the target function will be simple (for example nearly linear). Feature selection should always be used if possible and it does not matter how many data points you have, or how many dimensions. Again, the role of feature selection is to get the target function into a simpler form - that is, for the simple hypothesis set you plan to use, the deterministic noise is reduced. Some might use feature selection as a way of reducing dimension to control the var, but that is not its primary role. You can always do systematic dimension reduction after feature selection if you need to get better generalization.

Once you have determined your features, selected your hypothesis set, and only then look at your data, there will likely still be deterministic noise and almost always stochastic noise. The role of regularization is to help you deal with the noise.

If you have bad features, there will typically be lots of deterministic noise and you will need lots of regularization to combat it. If you have good features, then you may only need little regularization, primarily to combat the stochastic noise.

Summary: features and regularization address different things. Good features reduce deterministic noise. Regularization combats noise. Don't underestimate the role of either.

But as you see, to some extent, regularization can combat the extra deterministic noise when you have bad features. However, if you have lots of noise, that places a fundamental limit on learning. And, using a larger hypothesis set as a way to combat deterministic noise is not usually good because you suffer the disproportionate indirect impact of any noise through the var term in the bias var decomposition.


Quote:
Originally Posted by hsolo View Post
I had a couple of conceptual questions:

The VC result and Bias Variance result imply that if the number of features is very large then unless the number of training samples is high there is the sceptre of overfitting. So there is the requirement that feature selection has to be done systematically and carefully.

However it seems that if one uses regularization in some form then that can serve as a generic antidote to overfitting; and consequently one can ignore the feature dimensionality (assuming for a moment that the computing overhead of large feature set can be ignored) -- I got that impression from online notes from a couple of courses and I also saw in a recent Google paper that they used logistic regression with regularization on a billion-dimension (highly sparse) feature set..

Is this a correct notion from a statistics that if one uses regularization and is willing to pay the computing costs, one can be lax about feature selection?

Is there a theoretical result about the above notion (feature dimensionality and regularization effect on generalization error)?
__________________
Have faith in probability
Reply With Quote
  #3  
Old 03-07-2018, 01:42 PM
mostafa3030 mostafa3030 is offline
Junior Member
 
Join Date: Mar 2018
Posts: 6
Default Re: Feature dimensionality, regularization and generalization

tnx
فروش سوئیچ سیسکو
__________________
[CENTER][URL="http://seo8.ir"]http://seo8.ir[/URL] | [URL="http://seotehran.com"]http://seotehran.com[/URL] |
[URL="https://hamiseo.com"]https://hamiseo.com[/URL][/CENTER]
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT -7. The time now is 09:02 AM.


Powered by vBulletin® Version 3.8.3
Copyright ©2000 - 2018, Jelsoft Enterprises Ltd.
The contents of this forum are to be used ONLY by readers of the Learning From Data book by Yaser S. Abu-Mostafa, Malik Magdon-Ismail, and Hsuan-Tien Lin, and participants in the Learning From Data MOOC by Yaser S. Abu-Mostafa. No part of these contents is to be communicated or made accessible to ANY other person or entity.