To answer your questions.
1. The more general nonlinear models (including doing a feature transform) may or may not be better. It depends on your time series and whether the linear dependency on prior X's and prior residuals is a good model for the process. One thing to beware is that having both the X's and the prior residuals can result in a lot of parameter redundancy and overfitting. Using nonlinear models is recommended if the dependency is more complex; the caveat is that such models are easier to overfit and there may be no convenient "closed form" technique to estimate the parameters.
2. Yes, the general setup is the same, and you are well advised to use regularization and care in choosing the "size" of your ARMA (i.e. how many time steps in the past to autoregress onto).
HOWEVER, the theory covered in this book is not completely applicable to time series methods and a more detailed theoretical analysis needs to be performed to account for the fact that the training data are NOT independent. This becomes especially so if you generate your data points by moving 1step forward. For this reason, most of the theory regarding timeseries models starts by assuming that the process follows some law with (typically) Gaussian residuals. Then one can prove that certain ways of estimating the parameters of the ARMA model are optimal, etc. In the learning framework we maintained that the target function is completely unknown and general. So the ARMA type models would more appropriately be classified as "statisticsbased"models (see Section 1.2.4)
Quote:
Originally Posted by ksokhanvari
Dear Yaser,
Thanks very much for your response. I did take a look at echapter6 and distinction between parametric and nonparametric models.
However, to clarify my question I was also wondering about the overall relationship between the key components of the “learning theory” and the techniques used in machine learning with the more traditional methods of fitting polynomial models to data.
Specifically, in the domain of Time Series Analysis we fit a polynomial of the time series (e.g. ARIMA models) using the input value and its previous values (X(t1), X(t2), X( t3), …) for the AR component and the forecast error values (e1, e2, e3, …) for the MA components and once fitted we proceed to use such a model to forecast values for X(t+1), X(t+2), etc.
Therefore, we are just fitting (i.e. learning the parameters from previous examples) a linear parameter polynomial with a view that the time series values are related and time lag correlated with a decay built in as we move away from the recent values.
There are two main questions for me,
1) Given the above explicit assumption about the nature of the data in time series  are the more generalized models such as NNs, SVMs and high dimensional feature regression models have better generalization properties than traditional time series models?
2) Given the procedures for properly implementing machine learning techniques such as the use of regularization to avoid overfitting, or VC dimensional analysis for understanding the number of examples needed, or application of cross validation sets for parameter selection and out of sample error estimate measures – don’t these areas theoretically overlap with methods used in fitting polynomials in time series model analysis?
I am trying to extend what we have learnt in this course and understand areas of theoretical and fundamental overlap and true differences between domains and methods.
Many thanks
