Thanks for your response Dr Magdon it is really appreciated. I hope you don't mind me asking more questions (I suppose you won't answer if you don't want to lol).
Quote:
Originally Posted by magdon
Unfortunately, if you are using a linear model, performing this recency weighting as you suggest will have no effect because you are going to rescale the inputvariables by weights and so this rescaling will get absorbed into the weights.
Suppose when you learn without rescaling you find weight ; now, when you rescale , your learned weight will just rescale in the inverse way
; your insample error will be the same, as will your outofsample error.

Thanks for your explanation, you confirmed my main fear. This is great because I can now avoid this route.
Quote:
Originally Posted by magdon
You may have misunderstood the purpose of recency weighted regression; it is to differentially weight the error on different data points. In your case of stock prediction, it makes sense to weight the prediction error on the recenct days more than the prediction error on earlier days, hence the term recency weighted regression.

I haven't read your book just doing the online course, I see this thread has been moved from the "general homework" forum to "Chapter 3" of the book forum. If "recency weightings" are explained in your book (please could you confirm?) then I will scour the earth for your book as this area is of much interest. Previously I looked for your book on Amazon.co.uk but couldn't find, maybe I can order internationally through .com or some other shop.
Quote:
Originally Posted by magdon
Thus, if you let the input on day be ; the thing you are trying to predict on day is and the weights you learn are then the recency weighted error measure that one might wish to minimize is
are the weights; to emphasize the recent data points more, you would chose to be increasing with .

Though this looks a good solution to my problem I am not sure it would work with what I am trying to do... to add to my example:
If I had a number of different company shares in my database and for each company I had 1000 days of their share price data, I would therefore be able to create approximately 996 training rows per company. Each training row containing the previous 4 days prices.
To make simple, also assume I have managed to normalize each company's share prices so that they can be trained together (don't ask me how, this is just a made up example lol
)
So because of this, I think I still need something along the lines of:
[X0 X1 X1_1 X1_2 X2 X2_1 X2_3 X3 X3_1 X3_2 X4 X4_1 X4_2] per training row and a value of y which we will compare against.
Going back to what you wrote, the recency weightings that I made up are useless here as they would be absorbed however, the learning algorithm would still pick up the important variables and thus give them the higher weights so in a way, I would hope the learning algorithm would implicitly work out anyway that the more "recent" variables would get larger weights i.e., W1... > W2... > W3... > W4 relatively speaking. Though I am sure when I test such a case it probably won't be as clean cut due to the problems associated with VC and degrees of freedom.
Using the recency weights on the error as you suggested is a more failsafe way however I think I would then lose the structure I was hoping to use? Please can you confirm this in light of the additional model information I have presented? If so, maybe the following would work instead?
I just do a linear regression on the entire (1000xNoOfCompanies) rows so that each day is treated independently, once I have found my optimum weights I use them to calculate
for each row (I'm not sure about the squared bit?). These new values will then be grouped into a single row based on the "day" structure i.e.,
X0 = 1
X1 =
X2 =
X3 =
X4 =
a second bout of linear regression could then be used to work out the optimum "recency weights" for this new set (996xNoOfCompanies rows).
This second idea, or yours (if still applicable wrt desired model structure?), would certainly help in terms of reducing degrees of freedom and so would definitely be preferable imo.