LFD Book Forum  

Go Back   LFD Book Forum > General > General Discussion of Machine Learning

Reply
 
Thread Tools Display Modes
  #1  
Old 01-21-2013, 04:33 AM
curiosity curiosity is offline
Junior Member
 
Join Date: Jan 2013
Posts: 4
Question When to use normalization?

Hi all and Prof. Yaser,
Machine learning practitioners use to say that sometimes the input data should be normalized before an algorithm is trained on it. So, when should we normalize our input data? Put it another way, do all machine learning algorithms require normalization? If not, which ones require? And finally why is there a need for normalization?

Thanks bunches !
Reply With Quote
  #2  
Old 01-21-2013, 06:04 AM
magdon's Avatar
magdon magdon is offline
RPI
 
Join Date: Aug 2009
Location: Troy, NY, USA.
Posts: 595
Default Re: When to use normalization?

In general there is nothing lost in normalizing the data, and it can help various optimization algorithms.

You need to normalize the data for any algorithm that treats the inputs on an equal footing. For example an algorithm which uses the Euclidean distance (such as the Support Vector Machine) treats all the inputs on the same footing.

You should not normalize the data if the scale of the data has significance. For example if income is twice as important as debt in credit approval, then it is appropriate for income to have twice the size as debt. Or, if you do normalize the inputs in this case, then you should take this difference in importance into account some other way.

One important precaution when normalizing the data: if you are using something like validataion to estimate your test error, always normalize only the training data, and use the resulting normalization parameters to rescale the validation data. If you do not follow this strict prescription, then your validation estimate will not be legitimate.

Quote:
Originally Posted by curiosity View Post
Hi all and Prof. Yaser,
Machine learning practitioners use to say that sometimes the input data should be normalized before an algorithm is trained on it. So, when should we normalize our input data? Put it another way, do all machine learning algorithms require normalization? If not, which ones require? And finally why is there a need for normalization?

Thanks bunches !
__________________
Have faith in probability
Reply With Quote
  #3  
Old 01-21-2013, 07:03 AM
curiosity curiosity is offline
Junior Member
 
Join Date: Jan 2013
Posts: 4
Default Re: When to use normalization?

Quote:
One important precaution when normalizing the data: if you are using something like validataion to estimate your test error, always normalize only the training data, and use the resulting normalization parameters to rescale the validation data. If you do not follow this strict prescription, then your validation estimate will not be legitimate.
Thanks for the quick reply magdon. However, I didn't get this. What is the difference between normalizing and rescaling in this case? For the general case can you also please describe whether I need to also normalize my test data when evaluating the final model? Or in other words how should I evaluate the final model if I have used normalization during training? An example would be very appreciated...
Reply With Quote
  #4  
Old 01-21-2013, 09:15 AM
cygnids cygnids is offline
Member
 
Join Date: Jan 2013
Posts: 11
Default Re: When to use normalization?

Curiosity, Thanks for asking this question, and Prof Magdon, for his reply. A while ago, similar thoughts crossed my mind too.

- When we talk about normalization, are we talking about about getting rid of the "units" of the data? For eg., if the input vector has weight & height features, do we scale to effectively get rid of the kg and meter units, by say the average of weight and heights respectively (or some constant)? Is this what you mean by getting them on a equal footing?

- You caution "scaling" in a relatively sense. Generally speaking, is this to suggest that a cavalier application of simple normalization can distort the correlative structure implicit in the (original) input data?

- Your use of the word scaling raise another question in my mind. Does it make sense to keep an eye on whether the features of input data have disparate "ranges"? Say, one feature ranges from [0,1], and another from [1,1000]. Does it make sense to reduce the "range" of the later to make it comparable to the range of the other feature?

_ I tried to think through your comments in the context of supervised vs unsupervised learning. In a regression situation, we have LHS & RHS, and I suppose one could possibly be more cavalier about normalization, as long as it is done consistently across the system. However, for unsupervised learning, my immediate thoughts are that one needs to a lot more careful about relatively scaling between features. Roughly speaking, is my suspicion right?

-Related to these questions is a nagging concern whether one unduly gives insignificant features importance by bringing them on a "equal footing"?

Thank you for your comments & thoughts.
__________________
The whole is simpler than the sum of its parts. - Gibbs
Reply With Quote
  #5  
Old 01-21-2013, 02:49 PM
magdon's Avatar
magdon magdon is offline
RPI
 
Join Date: Aug 2009
Location: Troy, NY, USA.
Posts: 595
Default Re: When to use normalization?

When I said normalize, I meant place the data into some normal form, like having the same "scale"

Here is an example to help

Suppose you have three points:
x=(1,2),(-1,-2),(3,2)
y=+1,-1,+1

One way to normalize the data is to have the average squared value of each coordinate equal to 1. You would divide the first x-coordinate by \sqrt{11/3} and the second coordinate by 2. Now both coordinates are "normalized" so that the average squared value is 1.


Suppose instead you wanted to use the third point as a test point. Now you normalize the first 2 points. In this case you dont change the first coordinate and divide the second coordinate by 2, to get the normalized data. You learn on this normalized training data of 2 points and test the learned hypothesis on the 3rd point. Before you test the learned hypothesis, you need to rescale the test point with the same rescaling parameters that you used to normalize the 2 training data points.






Quote:
Originally Posted by curiosity View Post
Thanks for the quick reply magdon. However, I didn't get this. What is the difference between normalizing and rescaling in this case? For the general case can you also please describe whether I need to also normalize my test data when evaluating the final model? Or in other words how should I evaluate the final model if I have used normalization during training? An example would be very appreciated...
__________________
Have faith in probability
Reply With Quote
  #6  
Old 01-27-2013, 12:18 AM
scottedwards2000 scottedwards2000 is offline
Junior Member
 
Join Date: Jan 2013
Posts: 9
Default Re: When to use normalization?

Thanks, Dr. Magdon-Ismail for the example. However, I'm still not sure I understand exactly why we must use the same rescaling parameters for the training data. I guess I could see that if we were doing a simple log transform (e.g. if you used base-10 on training data you certainly wouldn't want to use base-2 on the test date), but in your example you are transforming the data to fit a certain criteria (avg sq value of each coordinate = 1). Would your model then expect a new data set to have the same quality? If we apply the exact rescaling parameters that we used on the training set to the test set, it certainly won't meet that criteria. Thanks for your help!
Reply With Quote
  #7  
Old 01-27-2013, 05:05 AM
magdon's Avatar
magdon magdon is offline
RPI
 
Join Date: Aug 2009
Location: Troy, NY, USA.
Posts: 595
Default Re: When to use normalization?

You are right, scaling can be any transformation. If you used some transformation to learn on the training data, you must use the same transformation when you test. Here is a simple idealized setting with your log transform and with simple scaling. Suppose the problem is 1-dim regression:

x: 2,4,6.
y: 6,12,18.

xtest=8
ytest=24

It is easy to see the relationship is y=3x. We can succesfully learn this from the training data. Now suppose we rescaled the x-data by 0.5 in the training:

x=1,2,3
y=6,12,18

What is the relationship you would learn:

y=6x

Now try to apply this to the test data: 24\not= 6\times 8, because you did not rescale the test data in exactly the way you did the training data. If you also rescale the test datum, then xtest becomes 4 and indeed the function you learned works: ytest=6 xtest'.

Lets see what happens with the log transform: the "rescaled", i.e. transformed x-data become:

x=log2,log4,log6
y=6,12,18

What is the relationship you would learn:

y=3e^x

If you simply apply this to the test point it will fail: 24\not=3\times e^8. You must first transform the test point to xtest'=log8. Now it is indeed the case that your learned function will work:

ytest=3e^{xtest'}

The thing to realize is that when you rescale the training data and then learn, the learning takes into account the scaling and the hypothesis learned will depend on what scaling is used as the examples above illustrate. In other words, the hypothesis works for any data point (training or test) only after the scaling is applied.





Quote:
Originally Posted by scottedwards2000 View Post
Thanks, Dr. Magdon-Ismail for the example. However, I'm still not sure I understand exactly why we must use the same rescaling parameters for the training data. I guess I could see that if we were doing a simple log transform (e.g. if you used base-10 on training data you certainly wouldn't want to use base-2 on the test date), but in your example you are transforming the data to fit a certain criteria (avg sq value of each coordinate = 1). Would your model then expect a new data set to have the same quality? If we apply the exact rescaling parameters that we used on the training set to the test set, it certainly won't meet that criteria. Thanks for your help!
__________________
Have faith in probability
Reply With Quote
  #8  
Old 01-30-2013, 11:38 PM
scottedwards2000 scottedwards2000 is offline
Junior Member
 
Join Date: Jan 2013
Posts: 9
Default Re: When to use normalization?

Thanks, Dr. Magdon-Ismail - that was really helpful!!
Reply With Quote
Reply

Tags
input normalization, normalization

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT -7. The time now is 11:51 AM.


Powered by vBulletin® Version 3.8.3
Copyright ©2000 - 2019, Jelsoft Enterprises Ltd.
The contents of this forum are to be used ONLY by readers of the Learning From Data book by Yaser S. Abu-Mostafa, Malik Magdon-Ismail, and Hsuan-Tien Lin, and participants in the Learning From Data MOOC by Yaser S. Abu-Mostafa. No part of these contents is to be communicated or made accessible to ANY other person or entity.