LFD Book Forum Feature dimensionality, regularization and generalization
 Register FAQ Calendar Mark Forums Read

#1
10-05-2013, 01:15 AM
 hsolo Member Join Date: Jul 2013 Posts: 12
Feature dimensionality, regularization and generalization

I had a couple of conceptual questions:

The VC result and Bias Variance result imply that if the number of features is very large then unless the number of training samples is high there is the sceptre of overfitting. So there is the requirement that feature selection has to be done systematically and carefully.

However it seems that if one uses regularization in some form then that can serve as a generic antidote to overfitting; and consequently one can ignore the feature dimensionality (assuming for a moment that the computing overhead of large feature set can be ignored) -- I got that impression from online notes from a couple of courses and I also saw in a recent Google paper that they used logistic regression with regularization on a billion-dimension (highly sparse) feature set..

Is this a correct notion from a statistics that if one uses regularization and is willing to pay the computing costs, one can be lax about feature selection?

Is there a theoretical result about the above notion (feature dimensionality and regularization effect on generalization error)?
#2
10-05-2013, 07:02 AM
 magdon RPI Join Date: Aug 2009 Location: Troy, NY, USA. Posts: 597
Re: Feature dimensionality, regularization and generalization

This is a very important point you raise. Feature selection and regularization play different roles.

Feature selection is used to construct the `right' input that is useful for predicting the output. With respect to the right features, the target function will be simple (for example nearly linear). Feature selection should always be used if possible and it does not matter how many data points you have, or how many dimensions. Again, the role of feature selection is to get the target function into a simpler form - that is, for the simple hypothesis set you plan to use, the deterministic noise is reduced. Some might use feature selection as a way of reducing dimension to control the var, but that is not its primary role. You can always do systematic dimension reduction after feature selection if you need to get better generalization.

Once you have determined your features, selected your hypothesis set, and only then look at your data, there will likely still be deterministic noise and almost always stochastic noise. The role of regularization is to help you deal with the noise.

If you have bad features, there will typically be lots of deterministic noise and you will need lots of regularization to combat it. If you have good features, then you may only need little regularization, primarily to combat the stochastic noise.

Summary: features and regularization address different things. Good features reduce deterministic noise. Regularization combats noise. Don't underestimate the role of either.

But as you see, to some extent, regularization can combat the extra deterministic noise when you have bad features. However, if you have lots of noise, that places a fundamental limit on learning. And, using a larger hypothesis set as a way to combat deterministic noise is not usually good because you suffer the disproportionate indirect impact of any noise through the var term in the bias var decomposition.

Quote:
 Originally Posted by hsolo I had a couple of conceptual questions: The VC result and Bias Variance result imply that if the number of features is very large then unless the number of training samples is high there is the sceptre of overfitting. So there is the requirement that feature selection has to be done systematically and carefully. However it seems that if one uses regularization in some form then that can serve as a generic antidote to overfitting; and consequently one can ignore the feature dimensionality (assuming for a moment that the computing overhead of large feature set can be ignored) -- I got that impression from online notes from a couple of courses and I also saw in a recent Google paper that they used logistic regression with regularization on a billion-dimension (highly sparse) feature set.. Is this a correct notion from a statistics that if one uses regularization and is willing to pay the computing costs, one can be lax about feature selection? Is there a theoretical result about the above notion (feature dimensionality and regularization effect on generalization error)?
__________________
Have faith in probability
#3
03-07-2018, 02:42 PM
 mostafa3030 Junior Member Join Date: Mar 2018 Posts: 6
Re: Feature dimensionality, regularization and generalization

__________________
[CENTER][URL="http://seo8.ir"]http://seo8.ir[/URL] | [URL="http://seotehran.com"]http://seotehran.com[/URL] |
[URL="https://hamiseo.com"]https://hamiseo.com[/URL][/CENTER]

 Thread Tools Display Modes Linear Mode

 Posting Rules You may not post new threads You may not post replies You may not post attachments You may not edit your posts BB code is On Smilies are On [IMG] code is On HTML code is Off Forum Rules
 Forum Jump User Control Panel Private Messages Subscriptions Who's Online Search Forums Forums Home General     General Discussion of Machine Learning     Free Additional Material         Dynamic e-Chapters         Dynamic e-Appendices Course Discussions     Online LFD course         General comments on the course         Homework 1         Homework 2         Homework 3         Homework 4         Homework 5         Homework 6         Homework 7         Homework 8         The Final         Create New Homework Problems Book Feedback - Learning From Data     General comments on the book     Chapter 1 - The Learning Problem     Chapter 2 - Training versus Testing     Chapter 3 - The Linear Model     Chapter 4 - Overfitting     Chapter 5 - Three Learning Principles     e-Chapter 6 - Similarity Based Methods     e-Chapter 7 - Neural Networks     e-Chapter 8 - Support Vector Machines     e-Chapter 9 - Learning Aides     Appendix and Notation     e-Appendices

All times are GMT -7. The time now is 07:37 PM.