LFD Book Forum  

Go Back   LFD Book Forum > General > General Discussion of Machine Learning

Reply
 
Thread Tools Display Modes
  #1  
Old 01-22-2014, 03:00 PM
rakhlin rakhlin is offline
Member
 
Join Date: Jun 2012
Posts: 24
Default VC dimension of time series models

Hello again dear Professor and all!

I want to determine VC dimension of time series models in order to avoid overfitting and estimate minimum size of data set.

1. First, maybe incorrect question as it does not articulate specific hypothesis space. A model takes input vector x_t = {r_{t-1};...r_{t-k}} of k lagged readings. Can VC dimension be approximately estimated as k?

2. Second, concrete time series model I'm working on, based on the article of Liehr and Pawelzik "A trading strategy with variable investment from minimizing risk to profit ratio" published in Physica A 287 (2000) 524-538.

Let me explain it briefly. Liehr and Pawelzik compare performance of two related models. Both models construct the series of input vectors by embedding the time series of returns into a space of embedding dimension k: x_t = {r_{t-1};...r_{t-k}}

a) discrete state model. Taking signs of k recent returns they get transformed into 2^k distinct states. For example, 5 lagged returns lead to 32 possible states. Each state produces forecast based on statistics of states like it.

b) RBF neural network. Training is performed by unsupervised adaptation of centers and subsequent gradient descent to adjust the second layer weights. For comparability, number of centers, Gaussians, is chosen equal to number of states 2^k in the first model.

Now, to my question. Liehr and Pawelzik do not use term 'VC dimension' but urge to avoid overfitting by using only a small number of Gaussians. In our terms they relate generalization ability to number of centers (RBF model) and number of states (discrete state model). They typically use 5 lagged returns which results in 32 states/centers. From Lecture 16 of this course I remember that number of centers in RBF model can be related to number of support vectors in SVM model. Number of support vectors for its turn is a proxy of VC dimension.

Am I correct, is VC dimension of the two models is approximately 2^k? Or just k?
Reply With Quote
  #2  
Old 01-22-2014, 05:08 PM
yaser's Avatar
yaser yaser is offline
Caltech
 
Join Date: Aug 2009
Location: Pasadena, California, USA
Posts: 1,474
Default Re: VC dimension of time series models

Interesting problem. First, as you anticipated, just specifying the input as k lagging returns would not determine the VC dimension. If you further specify that you are doing linear classification for example, then that makes the VC dimension equal k+1. If you use another model, it may be a different VC dimension.

Now with the two models (both are similarity-based models), I assume that the forecast is binary. You have grouped inputs into 32 categories, with all inputs in the same category necessarily mapping to the same \pm 1 label. If you have 33 input vectors, two of them must have the same sign pattern and therefore necessarily map to the same label, so 33 is a break point and indeed the VC dimension is 32. In the RBF case, if the clustering is done in an unsupervised way, then the VC dimension would be the number of parameters in the second layer, which is also 32 in this case by choice.

The equality of number of centers and number of support vectors in Lecture 16 was a forced assumption for comparison, but they need not be equal. The number of support vectors comes out of the process of solving the SVM kernel problem, whereas the number of clusters is a parameter under our control that we decide on before running Lloyd's.
__________________
Where everyone thinks alike, no one thinks very much
Reply With Quote
  #3  
Old 01-23-2014, 03:36 AM
rakhlin rakhlin is offline
Member
 
Join Date: Jun 2012
Posts: 24
Default Re: VC dimension of time series models

Thank you very much, Professor.

I have overlooked this apparent fact that number of centers in RBF model equals to estimated parameters, weights, and hence VC-dimension.

For the sake of brevity I haven't explained what the models forecast. It does not affect you conclusion but it's an interesting part on its own. Forecast isn't binary, they forecast 2 real valued outputs: return and variance of return. For discrete state model these 2 values are just a constant average of similar states (2^k pairs). Absolute ratio of the 2 values considered as a proxy of predictability, the more the better, and affects size of investment. Eventually, they forecast binary direction and real valued investment size. On the other side, from a trader's viewpoint, decomposition of forecast into return and variance is concrete and mature approach. It is rare among works on financial forecasting I saw. I have a gut feeling that state-dependent or regime-dependent models is the right approach in financial forecasting (unlike many others). The article is very solid and fresh despite it was written 14 years ago. Klaus Pawelzik is co-author of Vapnik in "Predicting time series with support vector machnes" published in 1997 and has many interesting works in various fields.

Dear Professor, in this model I was going to substitute RBF network with support vector regression. But I remember you noticed ones that support vector regression isn't that good as SVM for classification is. Is it worth of trying? Stay with RBF?
Reply With Quote
  #4  
Old 01-23-2014, 06:55 PM
yaser's Avatar
yaser yaser is offline
Caltech
 
Join Date: Aug 2009
Location: Pasadena, California, USA
Posts: 1,474
Default Re: VC dimension of time series models

Quote:
Originally Posted by rakhlin View Post
in this model I was going to substitute RBF network with support vector regression. But I remember you noticed ones that support vector regression isn't that good as SVM for classification is. Is it worth of trying? Stay with RBF?
Worth a try. You never know which model will work best in which real-life problem.
__________________
Where everyone thinks alike, no one thinks very much
Reply With Quote
  #5  
Old 01-23-2014, 11:36 PM
rakhlin rakhlin is offline
Member
 
Join Date: Jun 2012
Posts: 24
Default Re: VC dimension of time series models

Thank you!
Reply With Quote
  #6  
Old 06-02-2016, 05:17 PM
kamika05 kamika05 is offline
Junior Member
 
Join Date: May 2016
Location: algeria
Posts: 2
Default Re: VC dimension of time series models

thank you for this all information because they are helpfull for me a lot.
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT -7. The time now is 09:28 PM.


Powered by vBulletin® Version 3.8.3
Copyright ©2000 - 2017, Jelsoft Enterprises Ltd.
The contents of this forum are to be used ONLY by readers of the Learning From Data book by Yaser S. Abu-Mostafa, Malik Magdon-Ismail, and Hsuan-Tien Lin, and participants in the Learning From Data MOOC by Yaser S. Abu-Mostafa. No part of these contents is to be communicated or made accessible to ANY other person or entity.