LFD Book Forum  

Go Back   LFD Book Forum > Book Feedback - Learning From Data > Chapter 3 - The Linear Model

Reply
 
Thread Tools Display Modes
  #1  
Old 08-22-2012, 01:44 AM
itooam itooam is offline
Senior Member
 
Join Date: Jul 2012
Posts: 100
Default Recency weighted regression

Hi,

I wondered if anyone could help with the following:

(I'll make up a fictional example to explain in simple terms what I am trying to do):
If for example you created an extremely simple model that was to predict whether a share price was to rise or fall (for now we'll consider as a linear classification model) and the only inputs you had were:
X0 = 1
X1 = yesterday's share price
X2 = the share price the day before that in X1
X3 = the share price the day before that in X2
X4 = the share price the day before that in X3

it would seem sensible to apply more of a weighting to the more recent share prices so you may decide to do a transform before applying the learning i.e.,
you may create a new matrix Z = [X0 X1*0.9 X2*0.8 X3*0.7 X4*0.6]
and do the learning from Z.

Hope this makes sense so far?

My questions:

1) is this a sensible thing to do?

2) can the recency weights i.e., 0.9, 0.8, 0.7 and 0.6 be learned?


More Advanced:

Though this is a simple example, you may have more data each day for which you want to apply the same recency weighting i.e., you may have data for say (i) the minimum and (ii) the maximum price the share was on each day. In which case you may have a new model something like:

X0 = 1
X1 = yesterday's share price
X1_1 = the minimum price the share traded at yesterday
X1_2 = the maximum price the share traded at yesterday

X2 = the share price the day before that in X1
X2_1 = the minimum price the share traded the day before that in X1
X2_2 = the maximum price the share traded the day before that in X1

X3 = the share price the day before that in X2
X3_1 = the minimum price the share traded the day before that in X2
X3_2 = the maximum price the share traded the day before that in X2

X4 = the share price the day before that in X3
X4_1 = the minimum price the share traded the day before that in X3
X4_2 = the maximum price the share traded the day before that in X3


applying a new transform would be like this:
Z = [X0 X1*0.9 X1_1*0.9 X1_2*0.9 X2*0.8 X2_1*0.8 X2_3*0.8 X3*0.7 X3_1*0.7 X3_2*0.7 X4*0.6 X4_1*0.6 X4_2*0.6]

Hope this is still making sense?

Extra questions:
3) is this still (if it was before) a sensible thing to do?
4) can the recency weights i.e., 0.9, 0.8, 0.7 and 0.6 be learned?

Any pointers, discussion, answers much appreciated.
Reply With Quote
  #2  
Old 08-23-2012, 05:13 AM
magdon's Avatar
magdon magdon is offline
RPI
 
Join Date: Aug 2009
Location: Troy, NY, USA.
Posts: 595
Default Re: Recency weighted regression

Unfortunately, if you are using a linear model, performing this recency weighting as you suggest will have no effect because you are going to rescale the input-variables by weights and so this rescaling will get absorbed into the weights.

Suppose when you learn without rescaling you find weight W_1=1; now, when you rescale X_1\rightarrow 0.9 X_1, your learned weight will just rescale in the inverse way
W_1\rightarrow W_1/0.9\approx 1.111; your in-sample error will be the same, as will your out-of-sample error.

You may have misunderstood the purpose of recency weighted regression; it is to differentially weight the error on different data points. In your case of stock prediction, it makes sense to weight the prediction error on the recenct days more than the prediction error on earlier days, hence the term recency weighted regression. Thus, if you let the input on day t be \mathbf{x}_t; the thing you are trying to predict on day t is y_t and the weights you learn are \mathbf{w} then the recency weighted error measure that one might wish to minimize is

E_{in}=\sum_{t}\alpha_t(\mathbf{w}\cdot\mathbf{x}_t-y_t)^2

\alpha_t are the weights; to emphasize the recent data points more, you would chose \alpha_t to be increasing with t.

Quote:
Originally Posted by itooam View Post
Hi,

I wondered if anyone could help with the following:

(I'll make up a fictional example to explain in simple terms what I am trying to do):
If for example you created an extremely simple model that was to predict whether a share price was to rise or fall (for now we'll consider as a linear classification model) and the only inputs you had were:
X0 = 1
X1 = yesterday's share price
X2 = the share price the day before that in X1
X3 = the share price the day before that in X2
X4 = the share price the day before that in X3

it would seem sensible to apply more of a weighting to the more recent share prices so you may decide to do a transform before applying the learning i.e.,
you may create a new matrix Z = [X0 X1*0.9 X2*0.8 X3*0.7 X4*0.6]
and do the learning from Z.

Hope this makes sense so far?

My questions:

1) is this a sensible thing to do?

2) can the recency weights i.e., 0.9, 0.8, 0.7 and 0.6 be learned?


More Advanced:

Though this is a simple example, you may have more data each day for which you want to apply the same recency weighting i.e., you may have data for say (i) the minimum and (ii) the maximum price the share was on each day. In which case you may have a new model something like:

X0 = 1
X1 = yesterday's share price
X1_1 = the minimum price the share traded at yesterday
X1_2 = the maximum price the share traded at yesterday

X2 = the share price the day before that in X1
X2_1 = the minimum price the share traded the day before that in X1
X2_2 = the maximum price the share traded the day before that in X1

X3 = the share price the day before that in X2
X3_1 = the minimum price the share traded the day before that in X2
X3_2 = the maximum price the share traded the day before that in X2

X4 = the share price the day before that in X3
X4_1 = the minimum price the share traded the day before that in X3
X4_2 = the maximum price the share traded the day before that in X3


applying a new transform would be like this:
Z = [X0 X1*0.9 X1_1*0.9 X1_2*0.9 X2*0.8 X2_1*0.8 X2_3*0.8 X3*0.7 X3_1*0.7 X3_2*0.7 X4*0.6 X4_1*0.6 X4_2*0.6]

Hope this is still making sense?

Extra questions:
3) is this still (if it was before) a sensible thing to do?
4) can the recency weights i.e., 0.9, 0.8, 0.7 and 0.6 be learned?

Any pointers, discussion, answers much appreciated.
__________________
Have faith in probability
Reply With Quote
  #3  
Old 08-24-2012, 08:30 AM
itooam itooam is offline
Senior Member
 
Join Date: Jul 2012
Posts: 100
Default Re: Recency weighted regression

Thanks for your response Dr Magdon it is really appreciated. I hope you don't mind me asking more questions (I suppose you won't answer if you don't want to lol).

Quote:
Originally Posted by magdon View Post
Unfortunately, if you are using a linear model, performing this recency weighting as you suggest will have no effect because you are going to rescale the input-variables by weights and so this rescaling will get absorbed into the weights.

Suppose when you learn without rescaling you find weight W_1=1; now, when you rescale X_1\rightarrow 0.9 X_1, your learned weight will just rescale in the inverse way
W_1\rightarrow W_1/0.9\approx 1.111; your in-sample error will be the same, as will your out-of-sample error.
Thanks for your explanation, you confirmed my main fear. This is great because I can now avoid this route.

Quote:
Originally Posted by magdon View Post
You may have misunderstood the purpose of recency weighted regression; it is to differentially weight the error on different data points. In your case of stock prediction, it makes sense to weight the prediction error on the recenct days more than the prediction error on earlier days, hence the term recency weighted regression.
I haven't read your book just doing the online course, I see this thread has been moved from the "general homework" forum to "Chapter 3" of the book forum. If "recency weightings" are explained in your book (please could you confirm?) then I will scour the earth for your book as this area is of much interest. Previously I looked for your book on Amazon.co.uk but couldn't find, maybe I can order internationally through .com or some other shop.

Quote:
Originally Posted by magdon View Post
Thus, if you let the input on day t be \mathbf{x}_t; the thing you are trying to predict on day t is y_t and the weights you learn are \mathbf{w} then the recency weighted error measure that one might wish to minimize is

E_{in}=\sum_{t}\alpha_t(\mathbf{w}\cdot\mathbf{x}_t-y_t)^2

\alpha_t are the weights; to emphasize the recent data points more, you would chose \alpha_t to be increasing with t.
Though this looks a good solution to my problem I am not sure it would work with what I am trying to do... to add to my example:

If I had a number of different company shares in my database and for each company I had 1000 days of their share price data, I would therefore be able to create approximately 996 training rows per company. Each training row containing the previous 4 days prices.

To make simple, also assume I have managed to normalize each company's share prices so that they can be trained together (don't ask me how, this is just a made up example lol )

So because of this, I think I still need something along the lines of:
[X0 X1 X1_1 X1_2 X2 X2_1 X2_3 X3 X3_1 X3_2 X4 X4_1 X4_2] per training row and a value of y which we will compare against.

Going back to what you wrote, the recency weightings that I made up are useless here as they would be absorbed however, the learning algorithm would still pick up the important variables and thus give them the higher weights so in a way, I would hope the learning algorithm would implicitly work out anyway that the more "recent" variables would get larger weights i.e., W1... > W2... > W3... > W4 relatively speaking. Though I am sure when I test such a case it probably won't be as clean cut due to the problems associated with VC and degrees of freedom.

Using the recency weights on the error as you suggested is a more failsafe way however I think I would then lose the structure I was hoping to use? Please can you confirm this in light of the additional model information I have presented? If so, maybe the following would work instead?

I just do a linear regression on the entire (1000xNoOfCompanies) rows so that each day is treated independently, once I have found my optimum weights I use them to calculate \beta=(\mathbf{w}\cdot\mathbf{x}-y)^2
for each row (I'm not sure about the squared bit?). These new values will then be grouped into a single row based on the "day" structure i.e.,
X0 = 1
X1 = \beta_{t}
X2 = \beta_{t-1}
X3 = \beta_{t-2}
X4 = \beta_{t-3}

a second bout of linear regression could then be used to work out the optimum "recency weights" for this new set (996xNoOfCompanies rows).

This second idea, or yours (if still applicable wrt desired model structure?), would certainly help in terms of reducing degrees of freedom and so would definitely be preferable imo.
Reply With Quote
  #4  
Old 08-24-2012, 11:55 PM
magdon's Avatar
magdon magdon is offline
RPI
 
Join Date: Aug 2009
Location: Troy, NY, USA.
Posts: 595
Default Re: Recency weighted regression

Quote:
Originally Posted by itooam View Post
I haven't read your book just doing the online course, I see this thread has been moved from the "general homework" forum to "Chapter 3" of the book forum. If "recency weightings" are explained in your book (please could you confirm?) then I will scour the earth for your book as this area is of much interest. Previously I looked for your book on Amazon.co.uk but couldn't find, maybe I can order internationally through .com or some other shop.
The book does not specifically cover weighted regression; but it does cover linear models in depth. And yes, you can find the book on amazon.com; unfortunately it is not available on amazon.co.uk.

With respect to your question though, you seem to be confusing two notions of recency:

Let's take a simple example of one stock, which can generalize to the multiple stocks example. Suppose the stock's price time series is

P_0,P_1,\ldots,P_{1000}

At time t for t>3 you construct the input

\mathbf{x}_t=[1,P_{t-1},P_{t-2},P_{t-3},P_{t-4}]

and the target y_t=P_t. You would like to understand the relationship between \mathbf{x}_t and y_t. If you know this relationship, you are can predict the future price from previous prices. So suppose you build a linear predictor

y_t\approx \mathbf{w\cdot x_t}.

The learning task is to determine \mathbf{w}. To do this you minimize

E_{in}=\sum_{t>3}(\mathbf{w\cdot x_t}-y_t)^2

You will probably find that the weights in \mathbf{w} are not uniform. For example the weight multiplying P_{t-1} might be the largest; this means that the most recent price P_{t-1} is the most useful in predicting the next price y_t=P_{t}.

The notion of recency above should not be confused with recency weighted regression which is catering to the fact that the weights \mathbf{w} may be changing with time (that is in the stock example, the time series is non-stationary). To accomodate this fact you re-weight the data points giving more weight to the more recent data points. Thus you minimize the error function

E_{in}=\sum_{t>3}\alpha_t(\mathbf{w\cdot x_t}-y_t)^2

The \alpha_t enforce that the more recent data points will have more contribution to E_{in} and so you will choose a \mathbf{w} that better predicts on the more recent data points; in this way older data points play some role, but more recent data points play the dominant role in determining how to predict tomorrow's price.

Thus in the example of time series prediction, there are these two notions of recency at play:

(i) more recent prices are more useful for predicting tomorrows price

(ii) the relationship between this more recent price and tomorrows price is changing with time (for example sometimes it is trend following, and sometimes reversion). In this case, more recent data should be used to determine the relationship between today's price and tomorrow's price.
__________________
Have faith in probability
Reply With Quote
  #5  
Old 08-25-2012, 03:24 AM
itooam itooam is offline
Senior Member
 
Join Date: Jul 2012
Posts: 100
Default Re: Recency weighted regression

Thanks again for the thorough response Dr Magdon. I think we are talking along the same lines just a bit is lost in translation - one of the disadvantages of written communication. I apologies for my wording though, I don't mean to confuse; I used the words "Recency weighted regression" without knowing that this generally means something else in the machine learning literature.

I also think I now understand more clearly the application of
E_{in}=\sum_{t>3}\alpha_t(\mathbf{w\cdot x_t}-y_t)^2 so thanks again for explaining. I think I need to read up on this more as this makes me question: "how do I measure how well this recency weighting would have performed in the past?". I assume to answer this you would need to loop through the above formula starting from an arbitrary start date i.e., starting with a dataset equal to the rule of thumb: 10 x DegreesOfFreedom e.g., in context of the simplest model (\mathbf{x}_t=[1,P_{t-1},P_{t-2},P_{t-3},P_{t-4}]) we would start with a dataset of the first 50 days... pseudocode:

for i=50 to 996 step 1
.......\mathbf{x} = wholeDataSet[items 1 to i]
.......do the regression on \mathbf{x} and find \mathbf{w} by minimising E_{in}=\sum_{t=1}^{i}\alpha_t(\mathbf{w\cdot x_t}-y_t)^2
.......error = error + (\mathbf{w\cdot x_{(i+1)}}-y_{(i+1)})^2
endfor
Reply With Quote
  #6  
Old 08-25-2012, 03:35 AM
itooam itooam is offline
Senior Member
 
Join Date: Jul 2012
Posts: 100
Default Re: Recency weighted regression

If above is correct, there are many way's in which I can trial my underlying project.
Both
1)
E_{in}=\sum_{t>3}(\mathbf{w\cdot x_t}-y_t)^2
and
2)
E_{in}=\sum_{t>3}\alpha_t(\mathbf{w\cdot x_t}-y_t)^2

are worth a try... and also a variation of 2) without x in the form: \mathbf{x}_t=[1,P_{t-1},P_{t-2},P_{t-3},P_{t-4}], just input variables of x that are only applicable at that time t.
Reply With Quote
  #7  
Old 08-25-2012, 05:38 AM
magdon's Avatar
magdon magdon is offline
RPI
 
Join Date: Aug 2009
Location: Troy, NY, USA.
Posts: 595
Default Re: Recency weighted regression

Yes, that would be a way to run the process and estimate how good the predictor is.

Quote:
Originally Posted by itooam View Post
Thanks again for the thorough response Dr Magdon. I think we are talking along the same lines just a bit is lost in translation - one of the disadvantages of written communication. I apologies for my wording though, I don't mean to confuse; I used the words "Recency weighted regression" without knowing that this generally means something else in the machine learning literature.

I also think I now understand more clearly the application of
E_{in}=\sum_{t>3}\alpha_t(\mathbf{w\cdot x_t}-y_t)^2 so thanks again for explaining. I think I need to read up on this more as this makes me question: "how do I measure how well this recency weighting would have performed in the past?". I assume to answer this you would need to loop through the above formula starting from an arbitrary start date i.e., starting with a dataset equal to the rule of thumb: 10 x DegreesOfFreedom e.g., in context of the simplest model (\mathbf{x}_t=[1,P_{t-1},P_{t-2},P_{t-3},P_{t-4}]) we would start with a dataset of the first 50 days... pseudocode:

for i=50 to 996 step 1
.......\mathbf{x} = wholeDataSet[items 1 to i]
.......do the regression on \mathbf{x} and find \mathbf{w} by minimising E_{in}=\sum_{t=1}^{i}\alpha_t(\mathbf{w\cdot x_t}-y_t)^2
.......error = error + (\mathbf{w\cdot x_{(i+1)}}-y_{(i+1)})^2
endfor
__________________
Have faith in probability
Reply With Quote
  #8  
Old 08-25-2012, 08:26 AM
itooam itooam is offline
Senior Member
 
Join Date: Jul 2012
Posts: 100
Default Re: Recency weighted regression

Thank you for all your help it has been really appreciated. I have one final question, do you know if there is a closed form solution to

E_{in}=\sum_{t}\alpha_t(\mathbf{w}\cdot\mathbf{x}_t-y_t)^2

(assuming \alpha is a vector with the same number of rows as x?)

i.e., the closed form solution as used for linear regression and regularization - copied from lecture notes is this:

W_{reg} = (Z^{T} Z+\lambda I)^{-1}Z^Ty

I am not sure where \alpha would end up in the above, the derivation is beyond me mathematically?
Reply With Quote
  #9  
Old 08-26-2012, 01:37 AM
itooam itooam is offline
Senior Member
 
Join Date: Jul 2012
Posts: 100
Default Re: Recency weighted regression

Having spent some time on this, (this area of maths I am very weak)

I think the solution is:
W_{reg} = (Z^{T} A Z+\lambda I)^{-1}Z^TAy

Where A is a diagonal matrix. A bit like the Identity matrix but with weight values i.e.,
|\alpha_1, 0, 0, ... 0|
|0, \alpha_2, 0 ... 0|
|..................|
|0, 0, 0 .... \alpha_{datasetSize}|

The bit that makes this tricky (for me) is the regularisation. I suppose I could test the above using this formula and then try the same using gradient descent (where I know it will be correct) if the values are close then the above can be considered correct (if I plug in largely varying values of lamba for testing).
Reply With Quote
  #10  
Old 08-26-2012, 01:41 AM
itooam itooam is offline
Senior Member
 
Join Date: Jul 2012
Posts: 100
Default Re: Recency weighted regression

If the above is correct, it seems then there is another problem... if the dataset size is big i.e., 10000 then the A matrix will contain 10000^2 = 100,000,000 values. Ugh! How to deal with this?
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT -7. The time now is 11:42 AM.


Powered by vBulletin® Version 3.8.3
Copyright ©2000 - 2019, Jelsoft Enterprises Ltd.
The contents of this forum are to be used ONLY by readers of the Learning From Data book by Yaser S. Abu-Mostafa, Malik Magdon-Ismail, and Hsuan-Tien Lin, and participants in the Learning From Data MOOC by Yaser S. Abu-Mostafa. No part of these contents is to be communicated or made accessible to ANY other person or entity.