LFD Book Forum (http://book.caltech.edu/bookforum/index.php)
-   General Discussion of Machine Learning (http://book.caltech.edu/bookforum/forumdisplay.php?f=105)
-   -   When to use normalization? (http://book.caltech.edu/bookforum/showthread.php?t=3889)

 curiosity 01-21-2013 05:33 AM

When to use normalization?

Hi all and Prof. Yaser,
Machine learning practitioners use to say that sometimes the input data should be normalized before an algorithm is trained on it. So, when should we normalize our input data? Put it another way, do all machine learning algorithms require normalization? If not, which ones require? And finally why is there a need for normalization?

Thanks bunches !

 magdon 01-21-2013 07:04 AM

Re: When to use normalization?

In general there is nothing lost in normalizing the data, and it can help various optimization algorithms.

You need to normalize the data for any algorithm that treats the inputs on an equal footing. For example an algorithm which uses the Euclidean distance (such as the Support Vector Machine) treats all the inputs on the same footing.

You should not normalize the data if the scale of the data has significance. For example if income is twice as important as debt in credit approval, then it is appropriate for income to have twice the size as debt. Or, if you do normalize the inputs in this case, then you should take this difference in importance into account some other way.

One important precaution when normalizing the data: if you are using something like validataion to estimate your test error, always normalize only the training data, and use the resulting normalization parameters to rescale the validation data. If you do not follow this strict prescription, then your validation estimate will not be legitimate.

Quote:
 Originally Posted by curiosity (Post 8892) Hi all and Prof. Yaser, Machine learning practitioners use to say that sometimes the input data should be normalized before an algorithm is trained on it. So, when should we normalize our input data? Put it another way, do all machine learning algorithms require normalization? If not, which ones require? And finally why is there a need for normalization? Thanks bunches !

 curiosity 01-21-2013 08:03 AM

Re: When to use normalization?

Quote:
 One important precaution when normalizing the data: if you are using something like validataion to estimate your test error, always normalize only the training data, and use the resulting normalization parameters to rescale the validation data. If you do not follow this strict prescription, then your validation estimate will not be legitimate.
Thanks for the quick reply magdon. However, I didn't get this. What is the difference between normalizing and rescaling in this case? For the general case can you also please describe whether I need to also normalize my test data when evaluating the final model? Or in other words how should I evaluate the final model if I have used normalization during training? An example would be very appreciated...

 cygnids 01-21-2013 10:15 AM

Re: When to use normalization?

Curiosity, Thanks for asking this question, and Prof Magdon, for his reply. A while ago, similar thoughts crossed my mind too.

- When we talk about normalization, are we talking about about getting rid of the "units" of the data? For eg., if the input vector has weight & height features, do we scale to effectively get rid of the kg and meter units, by say the average of weight and heights respectively (or some constant)? Is this what you mean by getting them on a equal footing?

- You caution "scaling" in a relatively sense. Generally speaking, is this to suggest that a cavalier application of simple normalization can distort the correlative structure implicit in the (original) input data?

- Your use of the word scaling raise another question in my mind. Does it make sense to keep an eye on whether the features of input data have disparate "ranges"? Say, one feature ranges from [0,1], and another from [1,1000]. Does it make sense to reduce the "range" of the later to make it comparable to the range of the other feature?

_ I tried to think through your comments in the context of supervised vs unsupervised learning. In a regression situation, we have LHS & RHS, and I suppose one could possibly be more cavalier about normalization, as long as it is done consistently across the system. However, for unsupervised learning, my immediate thoughts are that one needs to a lot more careful about relatively scaling between features. Roughly speaking, is my suspicion right?

-Related to these questions is a nagging concern whether one unduly gives insignificant features importance by bringing them on a "equal footing"?

 magdon 01-21-2013 03:49 PM

Re: When to use normalization?

When I said normalize, I meant place the data into some normal form, like having the same "scale"

Here is an example to help

Suppose you have three points:
x=(1,2),(-1,-2),(3,2)
y=+1,-1,+1

One way to normalize the data is to have the average squared value of each coordinate equal to 1. You would divide the first x-coordinate by and the second coordinate by . Now both coordinates are "normalized" so that the average squared value is 1.

Suppose instead you wanted to use the third point as a test point. Now you normalize the first 2 points. In this case you dont change the first coordinate and divide the second coordinate by 2, to get the normalized data. You learn on this normalized training data of 2 points and test the learned hypothesis on the 3rd point. Before you test the learned hypothesis, you need to rescale the test point with the same rescaling parameters that you used to normalize the 2 training data points.

Quote:
 Originally Posted by curiosity (Post 8895) Thanks for the quick reply magdon. However, I didn't get this. What is the difference between normalizing and rescaling in this case? For the general case can you also please describe whether I need to also normalize my test data when evaluating the final model? Or in other words how should I evaluate the final model if I have used normalization during training? An example would be very appreciated...

 scottedwards2000 01-27-2013 01:18 AM

Re: When to use normalization?

Thanks, Dr. Magdon-Ismail for the example. However, I'm still not sure I understand exactly why we must use the same rescaling parameters for the training data. I guess I could see that if we were doing a simple log transform (e.g. if you used base-10 on training data you certainly wouldn't want to use base-2 on the test date), but in your example you are transforming the data to fit a certain criteria (avg sq value of each coordinate = 1). Would your model then expect a new data set to have the same quality? If we apply the exact rescaling parameters that we used on the training set to the test set, it certainly won't meet that criteria. Thanks for your help!

 magdon 01-27-2013 06:05 AM

Re: When to use normalization?

You are right, scaling can be any transformation. If you used some transformation to learn on the training data, you must use the same transformation when you test. Here is a simple idealized setting with your log transform and with simple scaling. Suppose the problem is 1-dim regression:

x: 2,4,6.
y: 6,12,18.

xtest=8
ytest=24

It is easy to see the relationship is y=3x. We can succesfully learn this from the training data. Now suppose we rescaled the x-data by 0.5 in the training:

x=1,2,3
y=6,12,18

What is the relationship you would learn:

y=6x

Now try to apply this to the test data: , because you did not rescale the test data in exactly the way you did the training data. If you also rescale the test datum, then xtest becomes 4 and indeed the function you learned works: ytest=6 xtest'.

Lets see what happens with the log transform: the "rescaled", i.e. transformed x-data become:

x=log2,log4,log6
y=6,12,18

What is the relationship you would learn:

If you simply apply this to the test point it will fail: . You must first transform the test point to xtest'=log8. Now it is indeed the case that your learned function will work:

The thing to realize is that when you rescale the training data and then learn, the learning takes into account the scaling and the hypothesis learned will depend on what scaling is used as the examples above illustrate. In other words, the hypothesis works for any data point (training or test) only after the scaling is applied.

Quote:
 Originally Posted by scottedwards2000 (Post 9021) Thanks, Dr. Magdon-Ismail for the example. However, I'm still not sure I understand exactly why we must use the same rescaling parameters for the training data. I guess I could see that if we were doing a simple log transform (e.g. if you used base-10 on training data you certainly wouldn't want to use base-2 on the test date), but in your example you are transforming the data to fit a certain criteria (avg sq value of each coordinate = 1). Would your model then expect a new data set to have the same quality? If we apply the exact rescaling parameters that we used on the training set to the test set, it certainly won't meet that criteria. Thanks for your help!

 scottedwards2000 01-31-2013 12:38 AM

Re: When to use normalization?

Thanks, Dr. Magdon-Ismail - that was really helpful!!

 All times are GMT -7. The time now is 12:43 AM.