In general there is nothing lost in normalizing the data, and it can help various optimization algorithms.
You
need to normalize the data for any algorithm that treats the inputs on an equal footing. For example an algorithm which uses the Euclidean distance (such as the Support Vector Machine) treats all the inputs on the same footing.
You should not normalize the data if the scale of the data has significance. For example if income is twice as important as debt in credit approval, then it is appropriate for income to have twice the size as debt. Or, if you do normalize the inputs in this case, then you should take this difference in importance into account some other way.
One important precaution when normalizing the data: if you are using something like validataion to estimate your test error,
always normalize
only the training data, and use the resulting normalization parameters to
rescale the validation data. If you do not follow this strict prescription, then your validation estimate will not be legitimate.
Quote:
Originally Posted by curiosity
Hi all and Prof. Yaser,
Machine learning practitioners use to say that sometimes the input data should be normalized before an algorithm is trained on it. So, when should we normalize our input data? Put it another way, do all machine learning algorithms require normalization? If not, which ones require? And finally why is there a need for normalization?
Thanks bunches !
