#1




Lecture 3 Q&A independence of parameter inputs
There is a discussion about the importance of having independent input data and how this propagates to features. Is it true that features necessarily inherit independence from data? If they don't, how bad is that? For example, in Finance there are quite a few studies using support vector machines using a grid defined by different moving averages, which overlap (1w, 1m, etc). In this case the features are clearly not independent. Would this be seen as a questionable procedure?

#2




Re: Lecture 3 Q&A independence of parameter inputs
Quote:
With regard to the choice of features for representing financial data, it is not difficult to remove the more obvious dependencies, but it is not clear that this is crucial. As an analog, suppose you have a basis for the plane (1,0) and (1,1). There is clearly a correlation between these two axes in your sense, but a simple linear transformation to a basis of (1,0) and (0,1) gets rid of it. If you are going to use kernels, you will be permitting many transformations of this type or others. The same is true of moving averages, where you can replace them with carefully chosen differences between them if you wish, but it may not be crucial. 
#3




Re: Lecture 3 Q&A independence of parameter inputs
Quote:
__________________
Where everyone thinks alike, no one thinks very much 
#4




Re: Lecture 3 Q&A independence of parameter inputs
Moobb, having rewatched the Q&A, my understanding is this. The independence that is important is that the input points are independently selected. Intuitively, they are a representative sample, rather than one which gives disproportionate importance to some region of the input space.
With regard to the features, these are a generalisation of coordinates which are used to describe the input data points (eg the value of a moving average is a feature which can be thought of as a coordinate, even though it is defined in terms of many coordinates). The independence that is preserved after a transformation is the independence between the data points, not the features: the set of points remains a representative sample of the (transformed) space of possible inputs. 
#5




Re: Lecture 3 Q&A independence of parameter inputs
Many thanks for your answer and sorry for not including the reference for the lecture, it is 1:10:40 (can't include the tag directly right now). I believe I understood it now: if the input points are not independent than chances are it won't generalise well for the full set of possible inputs (taking an example from number identification, if you increase the size of number 8 by a factor of two, you won't learn anything new by doing so). Using the analogy to the coordinate systems, if the features are not independent than you may have less information than you suppose to have, but it may still be more practical than devising a feature that automatically incorporates only new elements, in practice the algorithm will benefit only from the new information incorporated from the feature. Guess there is a practical limit in terms of model complexity at some point? Or that you may end up incorporating just noise, so the use sometimes of dimensionality reduction prior to establishing your features? Thanks again!

#6




Re: Lecture 3 Q&A independence of parameter inputs
Quote:
To add an example: Consider a Gaussian distribution with non Diagonal covariance matrix in 2D space. It is obvious that Features (read axis) are correlated or nonindependent. Performing a change of coordinate system, let's now have the eigenvector directions as the new coordinate system. No information is lost in the transformation (The space did not shrink or expand!) but now we have independent orthonomal coordinates. As pointed out, what is preserved is the "independence between the data points not the features". 
#7




Re: Lecture 3 Q&A independence of parameter inputs
First, let's try this neat lecture tag with the time Moobb gave (converted to seconds):
Quote:
Regarding financial time series, the assumptions of machine learning are not strictly true. It is recognised empirically that there is nonstationary behaviour (P(x) and P(y  x) change somewhat over time). [This may apply more to P(x) than P(y  x). For example, if you use 1990s data to create a model of the stock market, then used it in 2000, you might experience unexpected behaviour because you had never seen such market conditions before: the input data points could be a long way from any you had seen before. This is essentially the same reason that traders using any methods may lose when market conditions change a lot] As well as this, there is the nondeterministic component of the behavior. This effectively reduces the size of the input data set for the purpose of predicting the deterministic component (which is the main aim). This may be an issue, since the total amount of data available is rather limited. [If you have a stationary process and as much data as you want, I believe you can get as close to perfect knowledge of the deterministic component as you wish. But of course the nondeterministic component remains here as well]. Quote:
For more see: http://en.wikipedia.org/wiki/Multiva...l_distribution https://en.wikipedia.org/wiki/Princi...onent_analysis https://www.cs.princeton.edu/courses...notes/0419.pdf http://www.cs.unm.edu/~williams/cs530/kl3.pdf Bottom line is that it is always theoretically possible to remove the linear correlations between features using the transformation given by principal component analysis, but they can only be made independent in the case where they are jointly normally distributed (this means that any linear combination of them is normal). Moreover, this is always a reversible process preserving the independence of your input data points. As a footnote it is worth mentioning that a key reason market prediction is not entirely hopeless is that they do not exhibit perfect gaussian behaviour. It seems that the nonstationary behaviour is more important than the Hurst exponent being permanently greater than (or less than) 0.5 (it would be 0.5 for simple Brownian motion). eg see http://www.optimaltrader.net/old/hur...ictability.pdf 
#8




Re: Lecture 3 Q&A independence of parameter inputs
Elroch, thank you so much for your help. Regarding your point about non stationarity and the difficulty that introduces for financial forecasting, do you see that as necessarily invalidating any attempt towards machine learning forecasting in Finace? Could it be that the time series itself is non stationary, but some specific patterns within it (which people try to capture with technical indicators for example) are stable? Those technical indicators would than be your features and maybe when we conditional on them your time series become more stationary? I think another main use in Finance is in terms of classification, which can then be used for portfolio allocation for example.
Many thanks again!! 
#9




Re: Lecture 3 Q&A independence of parameter inputs
Thank you. But bear in mind there is much I do not yet know!
Quote:
Quote:
Regarding nonstationarity, there is an awkward conflict between the wish to have plenty of training data and the wish to have training data that is recent enough not to be misleading. One paper I read found an interesting way of dealing with this by weighting more recent data more strongly when training. http://stockresearch.googlecode.com/...prediction.pdf [If this doesn't display in your browser, try saving it and loading in a PDF viewer] Unfortunately, I am not yet aware of how to do this without modifying or rewriting general purpose machine learning tools. Fortunately discovering eternal truths about markets (or other time series) is not necessary, since there are two things you can do. One is to execute the training process at intervals, and a more radical solution is to replace your approach by a more sophisticated one if it stops being effective enough (for example, you might start with just moving averages as features, and it might work. If it stopped working you might add a Hurst exponent calculated at an appropriate scale, as a complex feature that might make machine learning more feasible, if you describe the problem in the right way. I don't know if this is true, but it may be. ). The possibilities are infinite, and I sometimes think that is more of a problem than a help! And yes, classification is surely as useful as prediction of real valued quantities. I like to think of it in information terms. A binary classification is a prediction of 1 bit of information. There are a huge range of possible bits you might choose to model. A realvalued prediction can be approximated by a classification problem where you have a sequence of bins corresponding to intervals. I am not sure of the relative merits when there is a choice. One issue is how the information being modelled relates to the way it will be used later. Of course any output can be considered as an indicator itself, but then there is the question of how that output will be used in trading and how it will affect trading results. In principle, error measures should be tailored to suit the effect on results, but this may not be easy. 
Thread Tools  
Display Modes  

