LFD Book Forum  

Go Back   LFD Book Forum > General > General Discussion of Machine Learning

Reply
 
Thread Tools Display Modes
  #1  
Old 01-18-2016, 01:45 PM
ksokhanvari ksokhanvari is offline
Junior Member
 
Join Date: Jan 2016
Posts: 3
Default Time Series method similarities

Dear Professor Abu-Mostafa

First I like to add my thanks and appreciation to countless other messages that you surely have received about this wonderful course. I am 53 years old and although I focused on AI techniques during my Master’s Degree in computer science, when I was younger , I did not have the quality of understanding I have gained after completing this course. AI has come a long way in 25 years and I am very excited to have discovered this class online. Congratulation to you and the Caltech community for this high quality work.

I do have a question regarding an application area regarding financial market forecasting. I have been working in this area for the past 10 years applying the typical methods of time series analysis to the problem of forecasting time series quantities. It seems to me that while time series analysis in the literature is covered as a separate field entirely, the application of ARIMA and GARCH models and the parameters fitting procedures found in the literature and software libraries share a significant amount of theoretical overlap to machine learning theory.

Could you please comment on how would you map these techniques (similarities and differences) to the machine learning map you presented? In particular, it seems the data handling and validation procedures should be the same. The ARIMA and GARCH models are just another hypothesis set model. The fitting procedures are similar to learning algos. The minimum AIC or Maximum likelihood model selection procedures is similar to regularization and VC dimension analysis, Occam razor etc., etc.

Additionally, in your experience given that today machine learning techniques are often applied to financial forecasting domain -- have you found a typical algorithm, i.e. NNs, SVMs, etc. have typically a better performance in this domain? the evidence in the literature is not clear to me.

I promise to charge the appropriate VC dimension cost to my solution sets !


Thanks again,

Regards,

Kamran Sokhanvari
Reply With Quote
  #2  
Old 01-21-2016, 01:08 PM
yaser's Avatar
yaser yaser is offline
Caltech
 
Join Date: Aug 2009
Location: Pasadena, California, USA
Posts: 1,474
Default Re: Time Series method similarities

Thank you for your post. This is an interesting question that touches on parametric versus non-parametric methods, as well as specialized models versus generic models (as well as other issues). Let me suggest that you take a look at the parametric versus non-parametric discussion in e-Chapter 6 (section 6.2.4), and then continue the dialog in this thread.
__________________
Where everyone thinks alike, no one thinks very much
Reply With Quote
  #3  
Old 01-26-2016, 03:27 PM
ksokhanvari ksokhanvari is offline
Junior Member
 
Join Date: Jan 2016
Posts: 3
Default Re: Time Series method similarities

Dear Yaser,

Thanks very much for your response. I did take a look at e-chapter-6 and distinction between parametric and non-parametric models.

However, to clarify my question I was also wondering about the overall relationship between the key components of the “learning theory” and the techniques used in machine learning with the more traditional methods of fitting polynomial models to data.

Specifically, in the domain of Time Series Analysis we fit a polynomial of the time series (e.g. ARIMA models) using the input value and its previous values (X(t-1), X(t-2), X( t-3), …) for the AR component and the forecast error values (e-1, e-2, e-3, …) for the MA components and once fitted we proceed to use such a model to forecast values for X(t+1), X(t+2), etc.

Therefore, we are just fitting (i.e. learning the parameters from previous examples) a linear parameter polynomial with a view that the time series values are related and time lag correlated with a decay built in as we move away from the recent values.

There are two main questions for me,

1) Given the above explicit assumption about the nature of the data in time series -- are the more generalized models such as NNs, SVMs and high dimensional feature regression models have better generalization properties than traditional time series models?

2) Given the procedures for properly implementing machine learning techniques such as the use of regularization to avoid over-fitting, or VC dimensional analysis for understanding the number of examples needed, or application of cross validation sets for parameter selection and out of sample error estimate measures – don’t these areas theoretically overlap with methods used in fitting polynomials in time series model analysis?

I am trying to extend what we have learnt in this course and understand areas of theoretical and fundamental overlap and true differences between domains and methods.


Many thanks
Reply With Quote
  #4  
Old 01-27-2016, 07:36 AM
magdon's Avatar
magdon magdon is offline
RPI
 
Join Date: Aug 2009
Location: Troy, NY, USA.
Posts: 592
Default Re: Time Series method similarities

To answer your questions.

1. The more general non-linear models (including doing a feature transform) may or may not be better. It depends on your time series and whether the linear dependency on prior X's and prior residuals is a good model for the process. One thing to beware is that having both the X's and the prior residuals can result in a lot of parameter redundancy and overfitting. Using non-linear models is recommended if the dependency is more complex; the caveat is that such models are easier to overfit and there may be no convenient "closed form" technique to estimate the parameters.

2. Yes, the general setup is the same, and you are well advised to use regularization and care in choosing the "size" of your ARMA (i.e. how many time steps in the past to auto-regress onto).

HOWEVER, the theory covered in this book is not completely applicable to time series methods and a more detailed theoretical analysis needs to be performed to account for the fact that the training data are NOT independent. This becomes especially so if you generate your data points by moving 1-step forward. For this reason, most of the theory regarding time-series models starts by assuming that the process follows some law with (typically) Gaussian residuals. Then one can prove that certain ways of estimating the parameters of the ARMA model are optimal, etc. In the learning framework we maintained that the target function is completely unknown and general. So the ARMA type models would more appropriately be classified as "statistics-based"-models (see Section 1.2.4)


Quote:
Originally Posted by ksokhanvari View Post
Dear Yaser,

Thanks very much for your response. I did take a look at e-chapter-6 and distinction between parametric and non-parametric models.

However, to clarify my question I was also wondering about the overall relationship between the key components of the “learning theory” and the techniques used in machine learning with the more traditional methods of fitting polynomial models to data.

Specifically, in the domain of Time Series Analysis we fit a polynomial of the time series (e.g. ARIMA models) using the input value and its previous values (X(t-1), X(t-2), X( t-3), …) for the AR component and the forecast error values (e-1, e-2, e-3, …) for the MA components and once fitted we proceed to use such a model to forecast values for X(t+1), X(t+2), etc.

Therefore, we are just fitting (i.e. learning the parameters from previous examples) a linear parameter polynomial with a view that the time series values are related and time lag correlated with a decay built in as we move away from the recent values.

There are two main questions for me,

1) Given the above explicit assumption about the nature of the data in time series -- are the more generalized models such as NNs, SVMs and high dimensional feature regression models have better generalization properties than traditional time series models?

2) Given the procedures for properly implementing machine learning techniques such as the use of regularization to avoid over-fitting, or VC dimensional analysis for understanding the number of examples needed, or application of cross validation sets for parameter selection and out of sample error estimate measures – don’t these areas theoretically overlap with methods used in fitting polynomials in time series model analysis?

I am trying to extend what we have learnt in this course and understand areas of theoretical and fundamental overlap and true differences between domains and methods.


Many thanks
__________________
Have faith in probability
Reply With Quote
  #5  
Old 02-01-2016, 11:33 AM
ksokhanvari ksokhanvari is offline
Junior Member
 
Join Date: Jan 2016
Posts: 3
Default Re: Time Series method similarities

Professor Magdon,

Thanks for your response to my question. I think your clarification regarding the independence assumption of the observations is key. However given the extreme noisy nature of the financial time series the Gaussian residuals is also an assumption that does not always hold as you know.

I am going to try and build several of the models and compare the results to see which ones have a better fit characteristics.


Thanks again.

Kamran
Reply With Quote
  #6  
Old 08-26-2017, 06:14 AM
pdsubraa pdsubraa is offline
Member
 
Join Date: Aug 2017
Location: Singapore
Posts: 14
Default Re: Time Series method similarities

@Kamran - First of all- Thank you for showing great respect for the Professor!

Do let me know once building several of the models and comparing the results!

I am not a expert in this field - just researching on my personal interest unofficially!

Thus consider me as a laymen - Thanks in advance!
Reply With Quote
Reply

Tags
arima, garch, time series

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT -7. The time now is 01:12 AM.


Powered by vBulletin® Version 3.8.3
Copyright ©2000 - 2017, Jelsoft Enterprises Ltd.
The contents of this forum are to be used ONLY by readers of the Learning From Data book by Yaser S. Abu-Mostafa, Malik Magdon-Ismail, and Hsuan-Tien Lin, and participants in the Learning From Data MOOC by Yaser S. Abu-Mostafa. No part of these contents is to be communicated or made accessible to ANY other person or entity.