View Single Post
Old 05-19-2013, 06:18 AM
Elroch Elroch is offline
Invited Guest
Join Date: Mar 2013
Posts: 143
Default Re: Machine Learning and census models

Originally Posted by yaser View Post
In financial time series, the data set is often constructed along these lines, where a sliding window of say T points is used to construct the training examples, with the first T-1 points taken as input and the Tth point taken as output.
Indeed: I have seen some papers taking this natural approach effectively. Since a predictive method is never going to have anything but past data, at least this must be part of the design.

However, with much more data than the 14 points described here, methods of the type I have researched in the past use two levels of design (similar to those discussed in your lectures and examples about cross-validation) to select the class of model (but generally for hyperparameters, rather than choice of the order of a polynomial approximation). The input data may be N consecutive values in the past or some other derived data. The predicted data may be some single value in the future. A data set consists of thousands of such data points of T inputs and one output, constructed from a single data set.

The aim is still to be able to predict the output from some new set of T inputs. It is in the generation of a method to do this that my point arose (and I spotted jlaurentum's example is analogous).

One approach looks at the several thousand data points as a sample from a distribution and applies cross validation followed by training with all of the data to generate a hypothesis for that distribution. This seems fine, but has an implicit assumption of stationarity. When intermediate points are being used to validate, future points are being used to generate the model. But the behaviour of the system in the future may depend on how it has behaved at intermediate points (this is how people make their decisions about what to do, and what their programs are using as data). Hence there is a subtle cheat going on here related to non-stationarity. Whether it is harmful or not, I am not yet sure.

This is how I arrived at the alternative approach of only using validation data that is in the future of the data used to select hyperparameters, as well as the generation of the final model. This is something rather like cross validation, but different (as the data used in training is restricted). The choice of the training and validation window sizes is interesting: in both cases there is a compromise between wanting it to have lots of points, but to not extend over too much time (due to non-stationarity). It has some similarity to a method used by technical traders called walk-forward optimisation which uses future out of sample errors to validate a method.

This modified approach is not necessarily completely immune from non-stationarity. It relies on the models being used being sophisticated enough to capture the changing behaviour of the system as a whole, so may fail if this becomes not true at some time.

I am not aware if there is much published on the way in which non-stationarity plays a role in this scenario.

One thing that strikes me is the way that the selection of hyperparameters through cross-validation (or the variant above) and the generation of a predictive hypothesis breaks the learning process up into two separate parts in quite a surprising way (even though a very neat and effective one). I wonder whether there are any other alternative ways to organise the overall learning process that might give good results?
Reply With Quote