LFD Book Forum (http://book.caltech.edu/bookforum/index.php)
-   General Discussion of Machine Learning (http://book.caltech.edu/bookforum/forumdisplay.php?f=105)
-   -   VC dimension of time series models (http://book.caltech.edu/bookforum/showthread.php?t=4469)

 rakhlin 01-22-2014 02:00 PM

VC dimension of time series models

Hello again dear Professor and all!

I want to determine VC dimension of time series models in order to avoid overfitting and estimate minimum size of data set.

1. First, maybe incorrect question as it does not articulate specific hypothesis space. A model takes input vector of lagged readings. Can VC dimension be approximately estimated as ?

2. Second, concrete time series model I'm working on, based on the article of Liehr and Pawelzik "A trading strategy with variable investment from minimizing risk to profit ratio" published in Physica A 287 (2000) 524-538.

Let me explain it briefly. Liehr and Pawelzik compare performance of two related models. Both models construct the series of input vectors by embedding the time series of returns into a space of embedding dimension :

a) discrete state model. Taking signs of recent returns they get transformed into distinct states. For example, 5 lagged returns lead to 32 possible states. Each state produces forecast based on statistics of states like it.

b) RBF neural network. Training is performed by unsupervised adaptation of centers and subsequent gradient descent to adjust the second layer weights. For comparability, number of centers, Gaussians, is chosen equal to number of states in the first model.

Now, to my question. Liehr and Pawelzik do not use term 'VC dimension' but urge to avoid overfitting by using only a small number of Gaussians. In our terms :) they relate generalization ability to number of centers (RBF model) and number of states (discrete state model). They typically use 5 lagged returns which results in 32 states/centers. From Lecture 16 of this course I remember that number of centers in RBF model can be related to number of support vectors in SVM model. Number of support vectors for its turn is a proxy of VC dimension.

Am I correct, is VC dimension of the two models is approximately ? Or just ?

 yaser 01-22-2014 04:08 PM

Re: VC dimension of time series models

Interesting problem. First, as you anticipated, just specifying the input as lagging returns would not determine the VC dimension. If you further specify that you are doing linear classification for example, then that makes the VC dimension equal . If you use another model, it may be a different VC dimension.

Now with the two models (both are similarity-based models), I assume that the forecast is binary. You have grouped inputs into 32 categories, with all inputs in the same category necessarily mapping to the same label. If you have 33 input vectors, two of them must have the same sign pattern and therefore necessarily map to the same label, so 33 is a break point and indeed the VC dimension is 32. In the RBF case, if the clustering is done in an unsupervised way, then the VC dimension would be the number of parameters in the second layer, which is also 32 in this case by choice.

The equality of number of centers and number of support vectors in Lecture 16 was a forced assumption for comparison, but they need not be equal. The number of support vectors comes out of the process of solving the SVM kernel problem, whereas the number of clusters is a parameter under our control that we decide on before running Lloyd's.

 rakhlin 01-23-2014 02:36 AM

Re: VC dimension of time series models

Thank you very much, Professor.

I have overlooked this apparent fact that number of centers in RBF model equals to estimated parameters, weights, and hence VC-dimension.

For the sake of brevity I haven't explained what the models forecast. It does not affect you conclusion but it's an interesting part on its own. Forecast isn't binary, they forecast 2 real valued outputs: return and variance of return. For discrete state model these 2 values are just a constant average of similar states ( pairs). Absolute ratio of the 2 values considered as a proxy of predictability, the more the better, and affects size of investment. Eventually, they forecast binary direction and real valued investment size. On the other side, from a trader's viewpoint, decomposition of forecast into return and variance is concrete and mature approach. It is rare among works on financial forecasting I saw. I have a gut feeling that state-dependent or regime-dependent models is the right approach in financial forecasting (unlike many others). The article is very solid and fresh despite it was written 14 years ago. Klaus Pawelzik is co-author of Vapnik in "Predicting time series with support vector machnes" published in 1997 and has many interesting works in various fields.

Dear Professor, in this model I was going to substitute RBF network with support vector regression. But I remember you noticed ones that support vector regression isn't that good as SVM for classification is. Is it worth of trying? Stay with RBF?

 yaser 01-23-2014 05:55 PM

Re: VC dimension of time series models

Quote:
 Originally Posted by rakhlin (Post 11631) in this model I was going to substitute RBF network with support vector regression. But I remember you noticed ones that support vector regression isn't that good as SVM for classification is. Is it worth of trying? Stay with RBF?
Worth a try. You never know which model will work best in which real-life problem.

 rakhlin 01-23-2014 10:36 PM

Re: VC dimension of time series models

Thank you!

 kamika05 06-02-2016 04:17 PM

Re: VC dimension of time series models

thank you for this all information because they are helpfull for me a lot.:bow:

 Hemantaryac 07-11-2018 07:20 AM

Re: VC dimension of time series models

Awesome thanks for this info. I love this community :)

 All times are GMT -7. The time now is 03:26 PM.