LFD Book Forum VC dimension of time series models
 Register FAQ Calendar Mark Forums Read

#1
01-22-2014, 03:00 PM
 rakhlin Member Join Date: Jun 2012 Posts: 24
VC dimension of time series models

Hello again dear Professor and all!

I want to determine VC dimension of time series models in order to avoid overfitting and estimate minimum size of data set.

1. First, maybe incorrect question as it does not articulate specific hypothesis space. A model takes input vector of lagged readings. Can VC dimension be approximately estimated as ?

2. Second, concrete time series model I'm working on, based on the article of Liehr and Pawelzik "A trading strategy with variable investment from minimizing risk to profit ratio" published in Physica A 287 (2000) 524-538.

Let me explain it briefly. Liehr and Pawelzik compare performance of two related models. Both models construct the series of input vectors by embedding the time series of returns into a space of embedding dimension :

a) discrete state model. Taking signs of recent returns they get transformed into distinct states. For example, 5 lagged returns lead to 32 possible states. Each state produces forecast based on statistics of states like it.

b) RBF neural network. Training is performed by unsupervised adaptation of centers and subsequent gradient descent to adjust the second layer weights. For comparability, number of centers, Gaussians, is chosen equal to number of states in the first model.

Now, to my question. Liehr and Pawelzik do not use term 'VC dimension' but urge to avoid overfitting by using only a small number of Gaussians. In our terms they relate generalization ability to number of centers (RBF model) and number of states (discrete state model). They typically use 5 lagged returns which results in 32 states/centers. From Lecture 16 of this course I remember that number of centers in RBF model can be related to number of support vectors in SVM model. Number of support vectors for its turn is a proxy of VC dimension.

Am I correct, is VC dimension of the two models is approximately ? Or just ?
#2
01-22-2014, 05:08 PM
 yaser Caltech Join Date: Aug 2009 Location: Pasadena, California, USA Posts: 1,478
Re: VC dimension of time series models

Interesting problem. First, as you anticipated, just specifying the input as lagging returns would not determine the VC dimension. If you further specify that you are doing linear classification for example, then that makes the VC dimension equal . If you use another model, it may be a different VC dimension.

Now with the two models (both are similarity-based models), I assume that the forecast is binary. You have grouped inputs into 32 categories, with all inputs in the same category necessarily mapping to the same label. If you have 33 input vectors, two of them must have the same sign pattern and therefore necessarily map to the same label, so 33 is a break point and indeed the VC dimension is 32. In the RBF case, if the clustering is done in an unsupervised way, then the VC dimension would be the number of parameters in the second layer, which is also 32 in this case by choice.

The equality of number of centers and number of support vectors in Lecture 16 was a forced assumption for comparison, but they need not be equal. The number of support vectors comes out of the process of solving the SVM kernel problem, whereas the number of clusters is a parameter under our control that we decide on before running Lloyd's.
__________________
Where everyone thinks alike, no one thinks very much
#3
01-23-2014, 03:36 AM
 rakhlin Member Join Date: Jun 2012 Posts: 24
Re: VC dimension of time series models

Thank you very much, Professor.

I have overlooked this apparent fact that number of centers in RBF model equals to estimated parameters, weights, and hence VC-dimension.

For the sake of brevity I haven't explained what the models forecast. It does not affect you conclusion but it's an interesting part on its own. Forecast isn't binary, they forecast 2 real valued outputs: return and variance of return. For discrete state model these 2 values are just a constant average of similar states ( pairs). Absolute ratio of the 2 values considered as a proxy of predictability, the more the better, and affects size of investment. Eventually, they forecast binary direction and real valued investment size. On the other side, from a trader's viewpoint, decomposition of forecast into return and variance is concrete and mature approach. It is rare among works on financial forecasting I saw. I have a gut feeling that state-dependent or regime-dependent models is the right approach in financial forecasting (unlike many others). The article is very solid and fresh despite it was written 14 years ago. Klaus Pawelzik is co-author of Vapnik in "Predicting time series with support vector machnes" published in 1997 and has many interesting works in various fields.

Dear Professor, in this model I was going to substitute RBF network with support vector regression. But I remember you noticed ones that support vector regression isn't that good as SVM for classification is. Is it worth of trying? Stay with RBF?
#4
01-23-2014, 06:55 PM
 yaser Caltech Join Date: Aug 2009 Location: Pasadena, California, USA Posts: 1,478
Re: VC dimension of time series models

Quote:
 Originally Posted by rakhlin in this model I was going to substitute RBF network with support vector regression. But I remember you noticed ones that support vector regression isn't that good as SVM for classification is. Is it worth of trying? Stay with RBF?
Worth a try. You never know which model will work best in which real-life problem.
__________________
Where everyone thinks alike, no one thinks very much
#5
01-23-2014, 11:36 PM
 rakhlin Member Join Date: Jun 2012 Posts: 24
Re: VC dimension of time series models

Thank you!
#6
06-02-2016, 05:17 PM
 kamika05 Junior Member Join Date: May 2016 Location: algeria Posts: 2
Re: VC dimension of time series models

thank you for this all information because they are helpfull for me a lot.

 Thread Tools Display Modes Hybrid Mode

 Posting Rules You may not post new threads You may not post replies You may not post attachments You may not edit your posts BB code is On Smilies are On [IMG] code is On HTML code is Off Forum Rules
 Forum Jump User Control Panel Private Messages Subscriptions Who's Online Search Forums Forums Home General     General Discussion of Machine Learning     Free Additional Material         Dynamic e-Chapters         Dynamic e-Appendices Course Discussions     Online LFD course         General comments on the course         Homework 1         Homework 2         Homework 3         Homework 4         Homework 5         Homework 6         Homework 7         Homework 8         The Final         Create New Homework Problems Book Feedback - Learning From Data     General comments on the book     Chapter 1 - The Learning Problem     Chapter 2 - Training versus Testing     Chapter 3 - The Linear Model     Chapter 4 - Overfitting     Chapter 5 - Three Learning Principles     e-Chapter 6 - Similarity Based Methods     e-Chapter 7 - Neural Networks     e-Chapter 8 - Support Vector Machines     e-Chapter 9 - Learning Aides     Appendix and Notation     e-Appendices

All times are GMT -7. The time now is 09:53 PM.