Feature dimensionality, regularization and generalization
I had a couple of conceptual questions:
The VC result and Bias Variance result imply that if the number of features is very large then unless the number of training samples is high there is the sceptre of overfitting. So there is the requirement that feature selection has to be done systematically and carefully.
However it seems that if one uses regularization in some form then that can serve as a generic antidote to overfitting; and consequently one can ignore the feature dimensionality (assuming for a moment that the computing overhead of large feature set can be ignored)  I got that impression from online notes from a couple of courses and I also saw in a recent Google paper that they used logistic regression with regularization on a billiondimension (highly sparse) feature set..
Is this a correct notion from a statistics that if one uses regularization and is willing to pay the computing costs, one can be lax about feature selection?
Is there a theoretical result about the above notion (feature dimensionality and regularization effect on generalization error)?
