Time Series method similarities
Dear Professor AbuMostafa
First I like to add my thanks and appreciation to countless other messages that you surely have received about this wonderful course. I am 53 years old and although I focused on AI techniques during my Master’s Degree in computer science, when I was younger :), I did not have the quality of understanding I have gained after completing this course. AI has come a long way in 25 years and I am very excited to have discovered this class online. Congratulation to you and the Caltech community for this high quality work. I do have a question regarding an application area regarding financial market forecasting. I have been working in this area for the past 10 years applying the typical methods of time series analysis to the problem of forecasting time series quantities. It seems to me that while time series analysis in the literature is covered as a separate field entirely, the application of ARIMA and GARCH models and the parameters fitting procedures found in the literature and software libraries share a significant amount of theoretical overlap to machine learning theory. Could you please comment on how would you map these techniques (similarities and differences) to the machine learning map you presented? In particular, it seems the data handling and validation procedures should be the same. The ARIMA and GARCH models are just another hypothesis set model. The fitting procedures are similar to learning algos. The minimum AIC or Maximum likelihood model selection procedures is similar to regularization and VC dimension analysis, Occam razor etc., etc. Additionally, in your experience given that today machine learning techniques are often applied to financial forecasting domain  have you found a typical algorithm, i.e. NNs, SVMs, etc. have typically a better performance in this domain? the evidence in the literature is not clear to me. I promise to charge the appropriate VC dimension cost to my solution sets ! :) Thanks again, Regards, Kamran Sokhanvari 
Re: Time Series method similarities
Thank you for your post. This is an interesting question that touches on parametric versus nonparametric methods, as well as specialized models versus generic models (as well as other issues). Let me suggest that you take a look at the parametric versus nonparametric discussion in eChapter 6 (section 6.2.4), and then continue the dialog in this thread.

Re: Time Series method similarities
Dear Yaser,
Thanks very much for your response. I did take a look at echapter6 and distinction between parametric and nonparametric models. However, to clarify my question I was also wondering about the overall relationship between the key components of the “learning theory” and the techniques used in machine learning with the more traditional methods of fitting polynomial models to data. Specifically, in the domain of Time Series Analysis we fit a polynomial of the time series (e.g. ARIMA models) using the input value and its previous values (X(t1), X(t2), X( t3), …) for the AR component and the forecast error values (e1, e2, e3, …) for the MA components and once fitted we proceed to use such a model to forecast values for X(t+1), X(t+2), etc. Therefore, we are just fitting (i.e. learning the parameters from previous examples) a linear parameter polynomial with a view that the time series values are related and time lag correlated with a decay built in as we move away from the recent values. There are two main questions for me, 1) Given the above explicit assumption about the nature of the data in time series  are the more generalized models such as NNs, SVMs and high dimensional feature regression models have better generalization properties than traditional time series models? 2) Given the procedures for properly implementing machine learning techniques such as the use of regularization to avoid overfitting, or VC dimensional analysis for understanding the number of examples needed, or application of cross validation sets for parameter selection and out of sample error estimate measures – don’t these areas theoretically overlap with methods used in fitting polynomials in time series model analysis? I am trying to extend what we have learnt in this course and understand areas of theoretical and fundamental overlap and true differences between domains and methods. Many thanks 
Re: Time Series method similarities
To answer your questions.
1. The more general nonlinear models (including doing a feature transform) may or may not be better. It depends on your time series and whether the linear dependency on prior X's and prior residuals is a good model for the process. One thing to beware is that having both the X's and the prior residuals can result in a lot of parameter redundancy and overfitting. Using nonlinear models is recommended if the dependency is more complex; the caveat is that such models are easier to overfit and there may be no convenient "closed form" technique to estimate the parameters. 2. Yes, the general setup is the same, and you are well advised to use regularization and care in choosing the "size" of your ARMA (i.e. how many time steps in the past to autoregress onto). HOWEVER, the theory covered in this book is not completely applicable to time series methods and a more detailed theoretical analysis needs to be performed to account for the fact that the training data are NOT independent. This becomes especially so if you generate your data points by moving 1step forward. For this reason, most of the theory regarding timeseries models starts by assuming that the process follows some law with (typically) Gaussian residuals. Then one can prove that certain ways of estimating the parameters of the ARMA model are optimal, etc. In the learning framework we maintained that the target function is completely unknown and general. So the ARMA type models would more appropriately be classified as "statisticsbased"models (see Section 1.2.4) Quote:

Re: Time Series method similarities
Professor Magdon,
Thanks for your response to my question. I think your clarification regarding the independence assumption of the observations is key. However given the extreme noisy nature of the financial time series the Gaussian residuals is also an assumption that does not always hold as you know. I am going to try and build several of the models and compare the results to see which ones have a better fit characteristics. Thanks again. Kamran 
Re: Time Series method similarities
@Kamran  First of all Thank you for showing great respect for the Professor!
Do let me know once building several of the models and comparing the results! I am not a expert in this field  just researching on my personal interest unofficially! Thus consider me as a laymen  Thanks in advance! 
All times are GMT 7. The time now is 06:00 PM. 
Powered by vBulletin® Version 3.8.3
Copyright ©2000  2019, Jelsoft Enterprises Ltd.
The contents of this forum are to be used ONLY by readers of the Learning From Data book by Yaser S. AbuMostafa, Malik MagdonIsmail, and HsuanTien Lin, and participants in the Learning From Data MOOC by Yaser S. AbuMostafa. No part of these contents is to be communicated or made accessible to ANY other person or entity.