View Single Post
#1
 rakhlin Member Join Date: Jun 2012 Posts: 24 VC dimension of time series models

Hello again dear Professor and all!

I want to determine VC dimension of time series models in order to avoid overfitting and estimate minimum size of data set.

1. First, maybe incorrect question as it does not articulate specific hypothesis space. A model takes input vector of lagged readings. Can VC dimension be approximately estimated as ?

2. Second, concrete time series model I'm working on, based on the article of Liehr and Pawelzik "A trading strategy with variable investment from minimizing risk to profit ratio" published in Physica A 287 (2000) 524-538.

Let me explain it briefly. Liehr and Pawelzik compare performance of two related models. Both models construct the series of input vectors by embedding the time series of returns into a space of embedding dimension : a) discrete state model. Taking signs of recent returns they get transformed into distinct states. For example, 5 lagged returns lead to 32 possible states. Each state produces forecast based on statistics of states like it.

b) RBF neural network. Training is performed by unsupervised adaptation of centers and subsequent gradient descent to adjust the second layer weights. For comparability, number of centers, Gaussians, is chosen equal to number of states in the first model.

Now, to my question. Liehr and Pawelzik do not use term 'VC dimension' but urge to avoid overfitting by using only a small number of Gaussians. In our terms they relate generalization ability to number of centers (RBF model) and number of states (discrete state model). They typically use 5 lagged returns which results in 32 states/centers. From Lecture 16 of this course I remember that number of centers in RBF model can be related to number of support vectors in SVM model. Number of support vectors for its turn is a proxy of VC dimension.

Am I correct, is VC dimension of the two models is approximately ? Or just ?