View Single Post
  #7  
Old 01-27-2013, 06:05 AM
magdon's Avatar
magdon magdon is offline
RPI
 
Join Date: Aug 2009
Location: Troy, NY, USA.
Posts: 595
Default Re: When to use normalization?

You are right, scaling can be any transformation. If you used some transformation to learn on the training data, you must use the same transformation when you test. Here is a simple idealized setting with your log transform and with simple scaling. Suppose the problem is 1-dim regression:

x: 2,4,6.
y: 6,12,18.

xtest=8
ytest=24

It is easy to see the relationship is y=3x. We can succesfully learn this from the training data. Now suppose we rescaled the x-data by 0.5 in the training:

x=1,2,3
y=6,12,18

What is the relationship you would learn:

y=6x

Now try to apply this to the test data: 24\not= 6\times 8, because you did not rescale the test data in exactly the way you did the training data. If you also rescale the test datum, then xtest becomes 4 and indeed the function you learned works: ytest=6 xtest'.

Lets see what happens with the log transform: the "rescaled", i.e. transformed x-data become:

x=log2,log4,log6
y=6,12,18

What is the relationship you would learn:

y=3e^x

If you simply apply this to the test point it will fail: 24\not=3\times e^8. You must first transform the test point to xtest'=log8. Now it is indeed the case that your learned function will work:

ytest=3e^{xtest'}

The thing to realize is that when you rescale the training data and then learn, the learning takes into account the scaling and the hypothesis learned will depend on what scaling is used as the examples above illustrate. In other words, the hypothesis works for any data point (training or test) only after the scaling is applied.





Quote:
Originally Posted by scottedwards2000 View Post
Thanks, Dr. Magdon-Ismail for the example. However, I'm still not sure I understand exactly why we must use the same rescaling parameters for the training data. I guess I could see that if we were doing a simple log transform (e.g. if you used base-10 on training data you certainly wouldn't want to use base-2 on the test date), but in your example you are transforming the data to fit a certain criteria (avg sq value of each coordinate = 1). Would your model then expect a new data set to have the same quality? If we apply the exact rescaling parameters that we used on the training set to the test set, it certainly won't meet that criteria. Thanks for your help!
__________________
Have faith in probability
Reply With Quote